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ABSTRACT 

Gaussianising the one-point distribution of the weak gravitational lensing convergence 
has recently been shown to increase the signal-to-noise contained in two-point statis- 
tics. We investigate the information on cosmology that can be extracted from the 
transformed convergence fields. Employing Box-Cox transformations to determine op- 
timal transformations to Gaussianity, we develop analytical models for the transformed 
power spectrum, including effects of noise and smoothing. Wc find that optimised Box- 
Cox transformations perform substantially better than an offset logarithmic transfor- 
mation in Gaussianising the convergence, but both yield very similar results for the 
signal-to-noise. None of the transformations is capable of eliminating correlations of 
the power spectra between different angular frequencies, which we demonstrate to have 
a significant impact on the errors on cosmology. Analytic models of the Gaussianised 
power spectrum yield good fits to the simulations and produce unbiased parameter 
estimates in the majority of cases, where the exceptions can be traced back to the lim- 
itations in modelling the higher-order correlations of the original convergence. In the 
idealistic case, without galaxy shape noise, we find an increase in cumulative signal- 
to-noise by a factor of 2.6 for angular frequencies up to ^ = 1500, and a decrease in 
the area of the confidence region in the J7ni — plane, measured in terms of q-values, 
by a factor of 4.4 for the best-performing transformation. When adding a realistic 
level of shape noise, all transformations perform poorly with little decorrelation of 
angular frequencies, a maximum increase in signal-to-noise of 34%, and even slightly 
degraded errors on cosmological parameters. We argue that, to find Gaussianising 
transformations of practical use, it will be necessary to go beyond transformations 
of the one-point distribution of the convergence, extend the analysis deeper into the 
non-linear regime, and resort to an exploration of parameter space via simulations. 

Key words: methods: data analysis - methods: analytical - methods: statistical - 
cosmological parameters ~ gravitational lensing: weak - large-scale structure of Uni- 
verse 



1 INTRODUCTION 

Weak gravitational lensing of distant galaxies by the large- 
scale structure is considered as one of the most power- 
ful probes of cosra olog ical physics (lAlbrecht et al.l l2006l : 
IPeacock et aLlliood : see iMunshi et all 20081 for a recent re- 
view). Planned surveys from the ground (e.g. LSSlO) and 
from space (e.g. EuclicQ) will measure the dark energy equa- 
tion of state, properties of dark matter, and possible de- 
viations from general relativity with unprecedented preci- 
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sion, reaching percen tage accuracy on some parameters (e.g. 
iRefregier etaLll2O10D . 



The standard analysis employs two-point statistics of 
the gravitational shear, which would fully specify the prop- 
erties of the underlying matter distribution if it were dis- 
tributed according to a Gaussian random field. However, 
non-linear structure formation induces correlations between 
different angular scales in Fourier space and hence re- 
duces the cosmological information contained in weak lens- 
ing two-point statistics. At the same time extra information 
is generated in higher-order statistics of the shear, which, 
if it can be extracted, improves parameter constraints. 
The most widespread approaches to exploit higher-order 
correlations make use of shear three-point statistics (e.g. 
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Semboloni et al.ll201ll ) and peak statistics (e.g. iBerge et al.l 
20081 ). 



The goal of highly precise inference on cosmological 
parameters entails the need for access to the constraining 
power created by non-linear effects on the shear fields in 
an effective way, and for the guarantee that none of the 
steps in the observations and analysis introduce uncertainty 
or systematics at a level which significantly affects errors 
on cosmology. These requirements have raised a range of 
issues driving current research, among them an efficient 
choice of higher-order statistic iBerge et al. 2010l). the de- 
termination of accurate covarianccs ( T akada fc JainI l2009l : 
iPielorz et allboiol : iKiessling et~aLH201lh . and the derivation 
of the functional form of the likelihood l|Hartlap et al.ll2009l : 
[Schneider fc Hartlad[2009l ). 

The problems listed above can, at least in principle, 
all be solved if one could find a bijective mapping of the 
observed gravitational shear field, or equivalently the weak 
lensing convergence field, such that the transformed field is 
described by a Gaussian random field. This field is com- 
pletely determined by its power spectrum which conse- 
quently contains all cosmological information present in the 
original field. Therefore only two-point statistics have to be 
considered in the likelihood analysis whose covariance can 
also be expressed in terms that are second-order in the shear 
or convergence. In addition, the common assumption of a 
simple Gaussian likelihood becomes exact when formulating 
it for the transformed convergence (for a similar ansatz in 
the context of cosmic microwave background temperature 
fiuctuations see iBond et al.|[2000l) . 

The recent work bv lSeo et al.l (|201ll ) suggests that such 
a beneficial transformation is approximately realised by tak- 
ing the logarithm of the positively offset convergence, decor- 
relating angular frequencies and boosting the signal-to-noise 
in the transformed power spectrum. Logarithmic transfor- 
mations are widely used in statistics to reduce the skewness 
in distributions which in the context of large-scale struc- 
ture is caused by the e xcess of high-density regions due 
to non-linear evolution. I Coles fc Jonej ([1993) provided a 
heuristic physical justification by demonstrating that the 
one-point distribution of the matter density contrast, 5, 
is lognormal in Lagrangian coordinates if one assumes the 
Zel'dovich approximation and Gaussian initial conditions for 
the matter density and velocity fields (for an exact calcula- 
tio n of the one-po int distribution under these assumptions 
see iKofman et al| [l994l . 



Kavo et al. I I2OOIII showed by means of N-body simula- 



tions that a lognormal model accurately describes the one- 
point distribution of S well into the non-linear regime. This 
fact has fostered the use of the lognormal distribution, or 
equivalently ln(l + 5) as a 'natur al' variable, in the mod- 
elling of large-scale structure (e.g. ISzapudi fc Kaiseij [20031 : 
iKitaura e t al. 2010). The weak lensing convergence k is a 
weighted projection of the matter density contrast and hence 
should inherit a skewed shape of the one-point distribu- 
tion, although its minimum varies as k does in practice not 
reach its theoreti cal lower limit , as opposed t o S ^ —1 (see 
the di scussion in iTaruva et al.l [2002) . Indeed iTaruva et al.l 
1I2OO2I) found that k is well described empirically as log- 
normal distributed, with some deviations reported for high 
source galaxy redshifts and the tails of the convergence dis- 
tribution. 



As presented in iNevrinck et all (|2009l . 120111 ), Gaus- 
sianising the one-point distribution of the matter density 
contrast via logarithmic transformation, or by matching 
the cumulative distribution function to a Gaussian one 
(referred to as rank-order Gaussianisation henceforth), in- 
creases the signal-to-noise in the transformed matter power 
spectrum. This result has triggered the analogou s stud- 
ies on the weak lensing convergence by ISeo et al] (I2OII ) 
considering logarithmic transformations and bv IYu et all 
(|201ll) who investigate the bispectrum and higher moments 
of the rank- order Gaussianised conve rgence. N ote also the 
approach of IZhang et all l|201ll ) and IYu et al] (2010) who 
apply a non-linear Wiener filter to the matter density con- 
trast and the convergence, respe ctively, ins t ead of transform- 
ing these quantities. Recently, INevrinck l|201ll ) presented 
a simulation-based study of the effect on cosmological pa- 
rameters due to the Gaussianisation of the matter density 
contrast, finding significantly improved constraints in some 
cases. 

This work is aimed at elucidating the cosmological infor- 
mation content of transformed convergence fields and their 
ability to constrain cosmological parameters using analyti- 
cal models. To this end, we employ Box-Cox transformations 
which encompass a range of transformations frequently ap- 
plied in statistics, including the logarithm, and which pro- 
vide us with an efficient maximum likelihood formalism to 
estimate their free parameters. This allows us to quantify 
how well logarithmic transformations fare in Gaussianising 
the one-point distribution of k, and derive optimal transfor- 
mations. 

Contrary to rank-order Gaussianisation, it is conceptu- 
ally easy to determine the statistics of the transformed con- 
vergence analytically for a parametrised form as given by the 
Box-Cox transformations. We assess the accuracy and limi- 
tations of our models in fitting the power spectra obtained 
from a large suite of simulations of transformed convergence 
fields, and investigate the constraints on cosmological pa- 
rameters that can be achieved with these models for differ- 
ent transformations as well as convergence maps with and 
without galaxy shape noise. 

The article is structured as follows: In Section[2]we sum- 
marise the simulations underlying our analysis. Section [3] 
describes the transformations we apply to the weak lensing 
convergence and their optimisation, as well as the modelling 
of the statistics of the transformed fields. Our results for 
optimal transformations, the analytical models of the power 
spectrum of the transformed fields, the noise properties, and 
the constraints on cosmology are presented in Section |4] In 
Section[S]we discuss our findings and their implications, be- 
fore summarising and concluding in Section [6] 



2 WEAK LENSING SIMULATIONS 

To test the performance of our transformations and models, 
and perform a mock likelihood analysis, we use 100 inde- 
pendent realisations of weak lensing convergence fields gen- 
erated by the SUNGLASS pipeline (iKiess ling et al., 201ll). 
They are based on medium-resolution dark matter-only N- 
body simulations with a box size of 512h~^ Mpc and periodic 
boundary conditions. The simulations are populated by 512"^ 
particles with mass 7.5 X 10^° Mq, using a force softening 
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length of 33 fe"^ kpc. A flat ACD M cosmology with WMAP7 
parameters l|jarosik et alJboilh . particularly iln-i = 0.27 and 
as = 0.81, is adopted. 

The weak lensing convergence is given by a weighted 
integral over the matter de nsity contrast, 5 (see e.g. 
iBartelmann fc Schneide3l200ll ). 



2c2 



""X'^d^^siM) , (1) 



where x denotes comoving distance and a the cosmic scale 
factor. The convergence depends on the redshift Zs of the 
galaxies which serve as sources for the weak lensing signal. 
Usually the source galaxies follow a distribution in redshift, 
but for simplicity we will assume a single source redshift 
Zs — 1, which is close to the median redshift of upcoming 
surveys. 

SUNGLASS assumes the Born approximation and com- 
putes the convergence directly via a discretised version of 
equation ([1]), using the three-dimensional particle positions. 
The light cone out to Zg = 1 is constructed from 19 snap- 
shots with a separation of 128h~^ Mpc each. To avoid the 
repetition of structure after spanning distances exceeding 
the box size of the simulation, boxes are randomly trans- 
lated and rotated. The convergence maps are computed on 
a grid with A'^^ points with A'' = 2048, covering an area of 
Afield ~ lOx 10 deg^. Since the realisations are fully indepen- 
dent, we obtain a total survey size of 10, 000 deg^ by jointly 
analysing all convergence maps. 

We are also interested in considering convergence maps 
with a realistic level of noise which in weak lensing is gov- 
erned by the random distribution of intrinsic galaxy shapes. 
To add shape noise, the convergence flelds are Fourier- 
transformed and converted to shear fields via 



7W 



(2) 



where 7 denotes the complex gravitational shear (quanti- 
fying the ellipticity and the position angle of the galaxy 
shape), and ipe the polar angle of the angular frequency vec- 
tor £. Note that, to ease the notation, we will throughout use 
the same symbol to designate quantities and their respective 
Fourier transforms. 

After Fourier-transforming back to real space, an intrin- 
sic ellipticity is added to each shear component at every grid 
point of the shear map by randomly drawin g values from a 
Gaussian distribution with dispersion a^/ •^2ngai Ancid/N'^, 
where ctj = 0.35 is a typical intrinsic ellipticity dispersion, 
and rigai — 30 arcmin"^ the assumed number density of 
galaxies. Inverting equation ((2)), one readily calculates the 
convergence from the shear maps with shape noise included 
by again transforming to Fourier space and back. 

Since we work with the convergence maps, the conver- 
gence power spectrum is the two-point statistic of choice for 
the subsequent likelihood analysis. We employ the estimator 



(3) 
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where A£ is the width of the angular frequency bin, which 
we choose to be constant in log-space with InA^ ~ 0.23. 
We use the notation K£ for the convergence values on a dis- 
cretely Fourier-transformed grid; see also Appendix [C] The 
sum runs over all angular frequency vectors which lie in a 



shell with central radius £ and width A£. To avoid aliasing 
in the power spectrum due to the edges of the convergence 
fields, a Hann window is applied to the convergence values 
in the margins covering the outmost 12.5 % of the maps. 
For further details on the simula tions and power spect rum 
estimation we refer the reader to lKiessling et all (|201ll ). 



3 TRANSFORMATIONS OF CONVERGENCE 
AND POWER SPECTRUM 

In the following we will detail the transformations that we 
apply to Gaussianise the convergence maps, including the 
procedure to estimate the free parameters in the transfor- 
mation equations. We then proceed to express the power 
spectrum of the transformed convergence in terms of the 
statistics of the original convergence, the central prerequisite 
that will allow us compute the analytical models required for 
the likelihood analysis. 



3.1 Box-Cox transformations 



IBox &: Coxl (|l964l ) introduced a parametrised set of power 
transformations that are widely used in statistical data anal- 
ysis, encompassing the logarithmic transformation which 
has recently gained attention in attempts to boosting infor- 
mation in cosmological density fields. For a given random 
sample of data, in our case the n = A*'^ grid point values 
of the convergence in one map, the Box-Cox transformation 
reads 



[(k, + a)^ - 1] /A A/0 
ln(Ki + a) A = 



(4) 



for all i = 1, .. ,n. We will consider both the power A and 
the shift a as free parameters of the transformation. Note 
that the transformed convergence is denoted by a bar, and 
that the dependence on A and a will mostly be suppressed 
henceforth. The normalisation of equation Q is chosen such 
that the transformation is continuous in A at A = 0. We have 
illustrated the mapping according to equation Q for a few 
exemplary cases in Fig.[T] 

The Box-Cox parameters (A, a) shall be determined 
from the sample {fCi} such that the one-point distribution 
of transformed convergence values, 'Pipt(K;), is Gaussiarlf]. 
The relation to the distribution of the original convergence 
is given by 



Plpt(K) = Plpt(K) JJ(Ki + a) 



(5) 



where the last term is the Jacobian of the Box-Cox trans- 
formation. Equation ((S} provides a model for the distribu- 
tion Pipt(K) from which the data sample {kj} is drawn, 
featuring only the two Box- Cox parameters and the mean 



Note that Box-Cox transformations are not limited to one- 
dimensional distributions. We follow earlier work by concentrat- 
ing on transforming the one-point distribution only, but discuss 
possible ways beyond this ansatz in Section [S] 
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Figure 1. Illustration of Box-Cox transformations. Shown is the 
mapping of the shifted original convergence k + a to the trans- 
formed convergence R for several values of A. Note that A = 
corresponds to the logarithmic transformation, and that A = 1 
leaves the convergence unchanged (except for an offset by a). In 
addition we have plotted a log-arctan transformation with s = 3 
as grey dashed curve; see Section 14.51 for details. 



and variance of the assumed Gaussian Vipt{i^) as undeter- 
mined parameters. Employing the maximum likelihood es- 
timators for mean and variance, one can derive the concen- 
trated log-likelihood for A and a ()Box fc Coxlll964l : see also 
I Joachimi fc TavloillioTll) . resulting in 

Lm..{X,a) = ln|i^[^,(A,a)-(^(A,a))]'| (6) 

n 

+ (A - 1) ^ln{Hii+a) . 

i = l 

Here, the term in curly brackets is the maximum likelihood 
estimate for the variance of the transformed convergence 
field, the angular brackets denoting the mean. Note that the 
exponential of the Gaussian likelihood is unity if the maxi- 
mum likelihood estimate of the mean is unbiased. Equation 
((Gjl constitutes a model for the distribution of the original 
convergence with only A and a as free parameters. Maximis- 
ing this equation with respect to the Box-Cox parameters 
provides us with maximum-likelihood estimates for A and a 
and thus with a method to determine an optimal transfor- 
mation to Gaussianity which is entirely driven by the data 
itself. 

Note that the Box-Cox transformation changes the di- 
mension of the data set under consideration which could be 
corrected for e.g. by dividing by a function of the geometric 
mean of the original data set (Box & Cox 1964)- However, 
since k is dimensionless, we prefer to keep the simplest pos- 
sible form of the transformation as given by equation (|4]). 
Unlike the original convergence, R has a non- vanishing mean 
on average which generally is a function of all moments of 
the original field. In principle this is irrelevant as there is 
no cosmological information in the mean, but due to the fi- 
nite size of the convergence maps large-scale modes might 
become biased. Hence we correct all transformed fields to a 
mean of zero. 



3.2 Transformed power spectrum 

One of the major advantages of Box-Cox transformations 
(including the logarithmic transformation) over rank-order 
Gaussianisation techniques as e.g. applied bv lNevrinck et ahl 
(2009) and Yu ct al. (2011j) is the analytical relation between 
original and transformed values in each grid point of the 
field. This permits us to calculate the power spectrum of the 
Box-Cox transformed convergence in terms of the statistics 
of the original convergence, which in turn are computed from 
the cosmological model. We begin by Taylor-expanding the 
term in parentheses appearing in equation ^ for A 7^ 0, 



[K + a) 



a + Aa K H 



1) A-2 2 
— a K 



(7) 



, A(A-l)(A-2) ;,_3 3^,^f 4^ 

H a K +0[k ) 



With this result the Fourier transform of the transformed 
convergence reads, to the same order in k, 



(8) 



j=0 

i2nr 



i^(a^-l) 5(2'(£) + a^-V(£) 
(A-l)(A-2) ^;,_3 [ d^^i [ 



6 



(27r)2 J (27r)2 
X k(£i) ^(£2) k(£ -ti- £2) + 0(k) , 

where j'^-* {£) denotes the Dirac-delta distribution. To arrive 
at the last equality, we applied the convolution theorem sev- 
eral times. Note that the corresponding calculation for the 
case A = 0, i.e. based on the expansion of ln(K -I- a), yields 
the same result except for the zeroth-order term. This term 
is generated by the non- vanishing mean of the transformed 
convergence, but contributes due to the Dirac-delta function 
only to the DC {£ — 0) component of the power spectrum 
and is hence irrelevant. As an aside, since we correct R, to 
zero mean, there are in principle more terms contributing to 
the zeroth order in equation but we have omitted this 
step to keep the formalism simple. 

It is evident from equation ([8J that the two-point cor- 
relation of K is a function of all n-point correlations of the 
original convergence. The d-th spectrum of the convergence 
is defined via 



(9) 



where the subscript c denotes the connected moments. We 
identify the second-order spectrum = Pk with the power 
spectrum, Sj^ = with the convergence bispectrum, and 
5** = Tk with the connected convergence trispectrum. Mak- 
ing use of these definitions, one arrives at the following ex- 
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pression, containing terms up to fourth order in k, 



(A 



(10) 



+ 



+ 



1) a-^ 
1)(A- 



(27r) 



2) _2 
— a 



B^{l,h,\l-l 



(27r)2 

(A -If 
4 



(27r)2 
2 



(2^)2 



(27r)2 y (27r) 



(2vr)2 



We provide the technical details of this and the following 
computations in AppendixlX] Third- or higher-order contri- 
butions consist of integrals that run up to infinitely high an- 
gular frequencies. The finite resolution of simulations or ob- 
servational data and our ignorance of e.g. the bispectrum in 
the highly non-linear regime necessitate a truncation of these 
integrals, which is achieved by Gaussian smoothing of the 
convergence maps before the transformation. The Fourier 
transform of the Gaussian kernel reads 



W{t) = 



yl/2 



(11) 



where ctc denotes the dispersion of the Gaussian. We will 
specify ctc in terms of the number of pixels in the convergence 
map that it covers, where the pixel size is approximately 
0.3 arcmin. 

Moreover, equation (|10|) illustrates that any non-linear 
transformation entails a more complex dependence on noise. 
Although we have made the simple assumption that the 
shape noise is Gaussian distributed and hence fully described 
by its power spectrum, the transformed power spectrum still 
receives extra contributions, e.g. from the Gaussian four- 
point terms. If th e distribution of intrinsic e llipticities is not 
Gaussian (see e.g. Ivan Waerbeke et~aLll200GD . Ps(^) will also 
contain its higher moments. In the cases without shape noise 
that we will consider, shot noise caused by the discrete sum- 
mation over particle positions in the simulation to obtain 
the convergence might become relevant. This particle shot 
noise should be approximately Gaussian, and we will thus 
represent both shape and shot noise by a scale-independent 
power spectrum Pnoiso. 

Taking the effects of smoothing and noise into account, 
one arrives at the following model for the transformed con- 
vergence power spectrum. 



(12) 



+ (A - 1) a^^-^ W{t) 



dfi i 



2tv 



[e, £i ,(.a{£,Ii, v^)] w [iA [i, (.1 , v)] 

+ (A - 1)(A - 2)a'''~^ {P4£) + Pnoisc} W^ii) 

{P4^l)+ Pnoiso} W^ih) 



(A-1)' 2A-4 



d.^1 ll 
2-K 



{P4£l) + Pnoiso} 



X W''(£l) I ^ {P« [£a{£,£i,'p)] + PnoiBo} 



X W''[£a{£,£i,<p)]+H{£) , 



where we defined 



£A{£,£i,f) = y/£^ +el~ 2££i cos ip 



(13) 



for convenience, and introduced a term 'H{£) which includes 
a number of higher-order contributions as detailed below. 
The smoothing kernel, W, now suppresses the remaining 
integrands exponentially for large values of angular frequen- 
cies. If Pnoiso attains values of the same order of magnitude 
as the original convergence power spectrum over a range 
of angular frequencies which are well outside the smoothing 
regime, it yields significant contributions in particular to the 
four-point term. 

A priori it is not guaranteed that the connected trispec- 
trum terms or higher-order correlations are small; indeed we 
find that including terms only up to the Gaussian four-point 
level results in substantially biased parameter constraints. 
Thus we incorporate a number of higher-order contributions 
into the model, subsumed into the term 'H(^). Here, we only 
provide a brief synopsis of the calculation of H, deferring 
the technical details to Appendix [Bl 

While results for the trisp ectrum fro m tree-level pertur- 
bation exist in the literature (|Frvlll984l ) and higher orders 
could be derived analogously, their non-linear evolution is 
very likely to be important in our modelling, yet unknown 
to date. As will be discussed further in Section 14.21 even 
the modelling of the convergence bispectrum in the mildly 
non-linear regime introduces already a significant amount of 
uncertainty. Hence we take a different approach and assume 
that the original convergence follows a lognormal distribu- 
tion, which should be reasonably accurate given the good 
performance of the logarithmic transformations to Gaussian- 
ity at the one-point level (see below), and which allows us 
to proceed analytically. 

Equation p2|) contains all terms up to second order 
in Pk, so that we include all terms proportional to P^ un- 
der the assumption of multivariate lognormality into H. 
This includes the lowest-order contribution to the lognor- 
mal trispectrum, the unconnected part of the convergence 
five-point correlation, and the Gaussian six-point correla- 
tion. We find that the connected moments calculated from 
the lognormal model significantly underestimate the mo- 
ments of the original convergence fields as measured from 
the simulations. Therefore we re-calibrate the amplitudes of 
the different contributions to the model to match the respec- 
tive simulation signal, thereby implicitly assuming that the 
angular dependence of the lognormal model is accurate. 



4 RESULTS 

The main goal of this work is to assess the cosmological 
information contained in the Box-Cox transformed conver- 
gence fields, which we will quantify in terms of a figure of 
merit for the two best-constrained cosmological parameters 
in weak lensing surveys, fJm and erg. In a first step we de- 
termine transformations that optimally Gaussianise the one- 
point distribution of the convergence before testing how well 
our analytic models recover the simulation results. The per- 
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Table 1. Overview on the transformations used in this work. In 
each case we consider a Box-Cox transformation with optimised 
parameters and a logarithmic transformation (which corresponds 
to A = 0. Parameters are determined from the unsmoothed con- 
vergence fields, from fields smoothed by a Gaussian of width 5 
pixels, and from fields with shape noise added. The identifiers 
used in the remainder of this paper, and the values of the Box- 
Cox parameters A and a are also listed. 



transformation 


convergence map 


identifier 


A 


a 


Box-Cox 


unsmoothed 


BCl 


-1.13 


0.14 


logarithmic 


unsmoothed 


LOCI 





0.07 


Box-Cox 


smoothed 


BC2 


-2.20 


0.08 


logarithmic 


smoothed 


LOG2 





0.03 


Box-Cox 


shape noise 


BCs 


-7.47 


0.62 


logarithmic 


shape noise 


LOGs 





0.07 



formance of the transformed convergence is further investi- 
gated via the signal-to-noise, power spectrum covariances, 
and a likelihood analysis in the flm — o"8 plane. We will first 
work with the idealistic case of convergence maps that only 
contain low levels of discreteness noise from the underlying 
N-body simulations, but repeat the analysis adding a real- 
istic amount of shape noise in Section [4.51 



4.1 Optimal transformations 

Each convergence map contains 2048^ pixels whose conver- 
gence values we use as the input data-vector for equation 
© to determine the optimal Box-Cox parameters. To gain 
further insight and keep the numerics tractable, we compute 
optimal values for A and a for each realisation individually 
and obtain the final pair of Box-Cox parameters by taking 
the mean over the 100 realisations. 

For comparison we also investigate logarithmic transfor- 
mations, i.e. A = in the parametrisation given by equation 
((4|, where we determine the shift a to be slightly larger than 
the absolute value of the minimum convergence in all 100 re- 
alisations. Since it is clear that the convergence maps need to 
be smoothed for the likelihood analysis, the question arises 
whether the transformation parameters shall be estimated 
from the untreated or the smoothed fields. We will investi- 
gate both cases, where throughout a smoothing kernel with 
a width of 5 pixels is used which, as will be demonstrated 
below, is suited to suppress noise and modelling uncertainty 
on small scales. 

In Fig. [2] we illustrate the dependence of the proper- 
ties of the one-point distribution of the transformed conver- 
gence on A and a. As diagnostics we use the skewness, excess 
kurtosis, and the KuUback-Leibler divergence Dkl between 
the convergence distribution and a Gaussian with the same 
mean and variance, defined as 



(14) 



and likewise for the transformed convergence. Note that we 
place the Gaussian reference distribution in the denominator 
because it has infinite support. All considered quantities ap- 
proach zero as the transformed convergence becomes more 
Gaussian. 



The KuUback-Leibler divergence obtains its minimum 
along a linear degeneracy line in the A — a plane. All optimal 
Box-Cox parameters determined from the 100 realisations 
come to lie close to this minimum, their distribution being 
excellently fit by the line 

a — m\ + b with (15) 
m = (-6.45 ± 0.07) x 10"^ ; b = (7.02 ± 0.08) x 10"^ . 

Contours of equal skewness are close to straight lines, and 
the region of vanishing skewness matches the valley in -Dkl • 
The kurtosis varies along this line, but apparently does not 
cause a significant deviation from a Gaussian distribution 
since Dkl as a global measure of Gaussianity remains ap- 
proximately constant. The scatter of optimal Box-Cox pa- 
rameter values along the degeneracy line is caused by cos- 
mic variance and determined by the varying position of the 
intersection with the zero-kurtosis contour, which for the re- 
alisation used to produce Fig. [2] right panel, is close to the 
mean of A and a taken over all realisations. 

In Table [T] an overview on the different transformations 
and their parameters is provided. The Box-Cox transforma- 
tions generally prefer negative values for A, e.g. if applied to 
the unsmoothed convergence fields, the optimum is close to 
an inverse transformation. Values of A < imply that high- 
density peaks in the convergence fields are downweighted 
even stronger than for a logarithmic transformation (see 
Fig.[l}. Note that deriving the value of the shift a for the 
logarithmic transformation from the minimum value of the 
convergence produces a pair of Box-Cox parameters that is 
also located in the valley of minimum skewness and Dkl , as 
indicated by the blue triangle in the left panel of Fig.O 

The plots of the one-point distribution of k shown in 
Fig-El confirm that both Box-Cox and logarithmic transfor- 
mation effectively remove the pronounced skewness of the 
original convergence distribution. The logarithmic transform 
features a significant deviation from the Gaussian case for 
values 3(7 and more above the mean which is avoided in 
the optimal Box-Cox transform by the negative value of A. 
When applied to the unsmoothed fields, the Box-Cox trans- 
formed distribution deviates less than ±10% in the range 
±3(7 around the mean while the logarithmic transformation 
features slightly larger deviations for k, values close to the 
mean and also differs from the Gaussian more significantly 
for extreme values of k. 

Smoothing the convergence field flattens high-density 
peaks and makes voids more shallow, so that the distribution 
of original convergence values is modifled to look slightly 
more Gaussian. Nonetheless the transformations we consider 
perform somewhat worse in rendering the one-point distri- 
bution Gaussian, which applies in particular to the logarith- 
mic transformation for k values far from the mean, see the 
right panel of Fig.|3l Note that the optimal Box-Cox parame- 
ters determined from the smoothed convergence flelds follow 
a similarly well deflned linear relation as the one shown in 
Fig-El only shifted to more negative values of A. 

Table [2l lists the mean and standard deviation, com- 
puted from 100 realisations, of skewness, kurtosis, and Dkl 
for the original and transformed convergence fields. For 
both unsmoothed and smoothed convergence the logarith- 
mic transformation improves all three diagnostics by at least 
an order of magnitude while the Box-Cox transformation 
adds another factor of 10 reduction in skewness and kur- 
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Figure 2. Left panel: KuUback-Leibler divergence Dkl ^ function of the Box-Cox parameters A and a for one randomly cliosen 
realisation of an unsmootlied convergence field. The grey shading corresponds to Z5kLi indicated by the colour bar, with smallest 
values shown in white. We also plot the distribution of optimal Box-Cox parameters A and a determined from each of the 100 realisations 
of convergence fields as black points. The red line indicates the linear fit JTSj. The blue dotted lines indicate the values of A and a 
used for the Box-Cox transformation BCl. The values of A and a that correspond to the logarithmic transformation are marked by the 
blue triangle. Right panel: Same as above, but for skewness and kurtosis of the Box-Cox transformed convergence field. Black contours 
correspond to the skewness and are linearly spaced with steps of 0.25, with values of given by the red line and negative values shown 
as dotted lines. The grey shading corresponds to the kurtosis, as indicated by the colour bar, with values above 0.7 shown in white. The 
green curve indicates vanishing kurtosis. 
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Figure 3. One-point distribution of convergence values. The bottom panels show the convergence distribution, stacked for 5 randomly 
chosen realisations of convergence fields and corrected to zero mean and unit variance. Shown is the original distribution of the convergence 
as red solid line, the logarithmically transformed convergence as blue dotted line, and the Box-Cox transformed convergence as black solid 
line. For reference a Gaussian distribution is shown in grey. The top panels display the relative deviation from the Gaussian distribution 
after logarithmic and Box-Cox transformation. Left panel: For the unsmoothed convergence. Right panel: For the convergence smoothed 
with a Gaussian kernel of width 5 pixels. 
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Table 2. Mean and standard deviation of skewness, kurtosis, and Kullback-Leibler divergence Z?kl a-s measured from the 100 realisations 
of convergence fields. The fields are either unsmoothed, smoothed by a Gaussian of width 5 pixels, or incorporate shape noise. In each 
case results are given for the original, the logarithmically transformed, and the Box-Cox transformed convergence fields. Values for the 
log-arctan transformation applied to the noisy convergence fields are listed as well. 



convergence map 


analysis 


skewness 


kurtosis 




unsmoothed 


original 

logarithmic (LOGl) 
Box-Cox (BCl) 


2.05 ± 0.45 
(0.57 ± 1.92) X 10-2 
(0.04 ± 1.29) X 10-2 


18.77 ±22.04 
(3.93 ±0.69) X 10-1 
(-2.86 ±3.78) X 10-2 


0.126 ±0.006 
(3.47 ±0.70) X 10-3 
(1.33 ± 0.18) X 10-3 


smoothed 


original 

logarithmic (LOG2) 
Box-Cox (BC2) 


1.80 ±0.32 
(1.02 ±0.61) X 10-1 
(0.24 ±3.94) X 10-2 


8.65 ±6.24 
(5.50 ± 1.19) X 10-1 
(2.37 ±5.35) X 10-2 


0.131 ±0.012 
(5.78 ± 1.40) X 10-3 
(1.42 ± 0.26) X 10-3 


shape noise 


original 

logarithmic (LOGs) 
Box-Cox (BCs) 
log-arctan 


0.83 ±0.17 
(-4.74 ±6.55) X 10-2 
(-2.32 ±4.31) X 10-2 
(-5.04 ±5.70) X 10-2 


3.11 ±2.36 
(7.28 ± 1.48) X 10-1 
(4.33 ±0.78) X 10-1 
(-1.86 ±4.37) X 10-2 


0.038 ± 0.007 
(7.49 ± 1.46) X 10-3 
(4.22 ± 0.84) X 10-3 
(1.74 ± 0.29) X 10-3 



low for efficient interpolation, we calculate in practice 
as a function of two triangle side lengths £i,£2 and their 
internal angle ip, with dense binning between £ = 1 and 
£ ~ 10, 000 (where the smoothing has safely suppressed all 
contributions to zero), and for (p £ [0; tt]. 

'Kiesshng et al.l(|201l h found that in the simulations un- 
der consideration angular frequencies larger than £ ~ 1500 
are significantly affected by particle shot noise and thus dis- 
carded these scales in their cosmological analysis, so that 
we can safely choose a smoothing scale that downweights 
scales I > 1500. Besides we have to make sure that we limit 
our study to sufficiently large scales on which higher-order 
correlations which we are not able to model have not yet 
become important. 

In Fig.[4l top panel, the mean simulation power spec- 
trum using transformation BCl (for the definition of iden- 
tifiers see Table [U, averaged over 100 realisations, is shown 
without any smoothing as well as for smoothing with kernels 
of width 2, 4, and 5 times the pixel size of the convergence 
map of 0.3 arcmin. Note that in this and all similar figures 
we use the error bars corresponding to a single realisation, 
i.e. a 100 deg^ patch; errors on the mean from all realisa- 
tions are smaller by a factor of 10. In addition we plot the 
power spectrum models obtained by using the corresponding 
smoothing window. Increasing the smoothing scale boosts 
the simulation signal on large scales and significantly re- 
duces it at ^ ~ 1000 and above. The model is biased high at 
high angular frequencies for the 2-pixel kernel, but provides 
a good fit to the simulation data on all scales for the 4- and 
5-pixel kernels. 

Using a narrow smoothing kernel, one includes more 
information from highly non-linear scales into the trans- 
formed power spectrum model for which the prescription of 
the bispectrum and the included higher-order contributions 
becomes insecure and the neglected higher-order statistics 
more important, hence the bias. Henceforth we will adopt a 
kernel width of 5 pixels (corresponding to 1.5 arcmin) which 
balances the systematic offset due to inaccurate modelling 
and the suppression of cosmological information at high an- 
gular frequencies, visible in the decrease in the amplitude of 
the transformed power spectrum setting in at increasingly 
smaller I. 



tosis. The Kullback-Leibler divergence decreases less when 
switching from logarithmic to Box-Cox transformation, by 
factors of 2.6 and 4.1, respectively. To illustrate the abso- 
lute values of Dkl, one can compare them to Dkl for two 
Gaussian distributions with identical variance but shifted 
means. We find that Dkl = 0.1, as found for the original 
convergence distribution, corresponds to a shift in the mean 
of half a standard deviation. Similarly, one obtains shifts 
of O.lcr (0.04cr) for Dkl = 5 x 10"^ (Dkl = 10"^), which 
is of the same order as the results for the logarithmic and 
Box-Cox transformations, respectively. 



4.2 Modelling accuracy 

To model the power spectra calculated from the Box-Cox 
and logarithmically transformed convergence fields accord- 
ing to equation H12p . convergence power spectra and bispec- 
tra are required. We compute the matter power spectrum 
Ps{k) for the simulation cosmology, employing the transfer 
function by .Eisenstei n fc Hu (19981) and t he correction for 



the non- li near regime bvlSmith et al.l(|2003h . As was demon- 
strated in iKiessling et al.l i 20 111 ), our model power spectra 



match the simulation results well in the relevant angular 
frequency regime. 

The converg ence power sp ectrum is then given by the 



ihe converg ence power sp e 
Limber equation (|Kaise d 11993) 



= dx^P.(^-,xj, (16) 

where we used the lensing efficiency g{x) ~ ^ ~ x/xi^s) 
with Zs — 1. The ana logous equation for the convergence 
bispectrum reads (e.g. iTakada fc JainI [20041 ) 

D4.r,..,.3) ^ '-^r^xM. (17) 



8c« Jo Xa^x) 
r, (£i £2 £3 
\X X X 

The matter bispe ctrum Bs( ki,k2,k3) is computed via per- 
turbation theory (|Frvl 11984 ) from the matter power spec- 
trum, applying th e corrections due to non-linear s tructure 
evolution given in IScoccimarro fc CouchmanI l|200ll ). To al- 
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Figure 4. Top panel: Box-Cox transformed convergence power 
spectrum obtained from the mean of the 100 realisations. Grey 
circles correspond to the power spectrum computed from the un- 
smoothed convergence, blue squares (purple upward triangles; 
black downward triangles) to the power spectrum obtained from 
the convergence smoothed with a Gaussian kernel of width 2 (4; 
5) pixels. The corresponding models for the smoothed power spec- 
tra are shown as blue dotted lines (width 2 pixels), purple dashed 
lines (width 4 pixels), and black solid lines (width 5 pixels). Note 
that points and curves have been slightly offset horizontally for 
clarity. Centre panel: Contributions to the smoothed (5 pixels) 
Box-Cox transformed power spectrum (BCl). The model includ- 
ing up to two-point (three-point; Gaussian four-point; all) terms 
is shown as orange (red; violet; black) curve. Triangles indicate 
the simulation results, cf. the top panel. The light grey area in- 
dicates the uncertainty due to the modelling of the bispectrum 
in the non-linear regime, computed from equation I I18II . The dark 
grey area shows the variation in the model resulting from the nor- 
malisation of the trispectrum contribution r4; see equation ||B20|| . 
Bottom panel: Same as in the centre panel, but based on the BC2 
transformation. 



Particle shot noise is incorporated throughout in our 
models, computed v ia the analytical formula given in 
iKiessling et al.l (|201lh . equation (15), which yields Pnoisc ~ 
2.4 X 10"^'^. The 5-pixel window effectively downweights the 
regime where shot noise becomes important, so that the 
models are affected by less than 3 % in the range 100 < 

e < 1500. 

The centre panel of Fig.|4] details the contributions of 
terms with different orders of k to the model of the trans- 
formed power spectrum, again for transformation BCl. The 
two-point term is simply a rescaled version of the original 
convergence power spectrum while the three-point contri- 
bution is negative due to A < 0, see equation (1121) . and has 
a stronger effect at high angular frequencies. The Gaussian 
four-point term adds to the model almost constantly over 
the range of £ considered. 

The combined higher-order contribution is also positive 
and surpasses the Gaussian four-point term in amplitude 
(although the latter is second order in P^), in particular on 
small scales. The full model yields a good fit to the simula- 
tion, being marginally low at high angular frequencies, but 
note that in this regime error bars are significantly corre- 
lated. 

We construct a toy model to estimate the infiu- 
ence of the limited accuracy of modelling the non-linear 
matter bispectrum by the IScoccimarro Sz CouchmanI 
(2001) fitting formula. A multiplicative term / 
is introduced which modifies the convergence bis- 

pectrum to BjiuizJ-i ) ^ B(^i,^2,4) /(^i,^2). 

IScoccimarro fc CouchmanI (|200ll ) found little depen- 
dence of the accuracy of their fit on the internal angle 
of a triangle of angular frequencies, so that, without loss 
of generality, we assume / to only depend on £i and £2- 
Due to lack of information about any further dependence 
on triangle shapes, we furthermore assume that any de- 
viation of the fit can be phrased in terms of the mean 
side length £ = l/2{£i + £2). J udging from the plots in 
IScoccimarro fc CouchmanI ()200l[ ). the formula fits their 
ACDM simulations well up to fc ~ 0.5/i~^ Mpc. We translate 
this into a scale £ ~ 700 as the sensitivity of lensing peaks 
at about half the source distance at Zs = 1. 

The discrepancy between fit formula and simulation 
seems to increase linearl y at high wavenumbers, with a n 
average accuracy of 15 % H coccimarro fc Couchmanll200ll ). 
so that we define 

Lo.i5(4-i) 'ilZ ■ 

Although the fit formula persistently underestimates the 
simulations used in that work, we understand the model in 
equation (|18p as a rough estimate for the general accuracy 
of the fit and consider both positive and negative deviations 
from the formula, which leads to the light grey regime of 
uncertainty in the transformed convergence power spectrum 
shown in Fig.[4l At high £ this uncertainty amounts to about 
±5 % for the BCl transformation and is thus of the same 
order as the Gaussian four-point contribution. 

The dark grey regions shown in Fig.|4] correspond to 
the uncertainty induced by the measurement error of the 
connected fourth moment, entering n, which we determine 
from the simulations to normalise the lognormal trispectrum 
contribution in %{£); see equations (|12p and (|B20|) . For the 
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Figure 5. Top panels: Correlation coefficients for the power spectrum covariance, computed from the original convergence fields (red 
solid lines), the Box-Cox transformed fields (black lines), and the logarithmically transformed fields (blue lines). In the two latter cases 
solid curves were obtained for the transformations determined from the smoothed convergence fields (BC2,LOG2), and dotted curves for 
the transformations determined from the unsmoothed fields (BCl, LOGl). In the left panel the correlation at low angular frequencies 
(around I 65) is shown; in the right panel the correlation for medium high angular frequencies (around £ 420). Bottom panels: 
Same as above, but for angular frequencies around I 1350, close to the maximum used for the likelihood analysis. The left panel, like 
the top panels, shows results obtained without shape noise, while in the right panel shape noise is included, using the corresponding 
transformation parameters. We additionally show the results for the log-arctan transformation discussed in Section 14.51 as green dashed 
curves. 



BCl transformation the higher-order terms are small and so 
is the uncertainty due to n. 

The bottom panel of Fig. 3] displays the model details 
for the Box-Cox transformation BC2 (determined from the 
smoothed convergence fields). Since compared to BCl A is 
more negative and a closer to zero, the higher-order contri- 
butions are boosted much stronger, which entails a larger 
impact of the uncertainty in the bispectrum and r4, the lat- 
ter having a weaker dependence on I and thus dominating 
on large and intermediate scales. Despite these substantial 
sources of uncertainty and the high amplitudes of each of the 
three-point, four-point, and H terms, the full model provides 
an excellent fit to the simulation power spectrum also in the 
BC2 case. 



4.3 Noise properties 

If Gaussianising the one-point distribution of the conver- 
gence succeeds in turning convergence maps into approxi- 
mative realisations of a Gaussian random field, one expects 
that the covariance of the convergence power spectrum is di- 
agonal. Conversely, any significant cross-correlation between 



ang ular frequencies is a clear sign for a non-zero trispectrum 
(e.g. iPielorz et aLll2010l ) . We determine the power spectrum 



covariance 



Covp(^,/) = (Aw p.{t)) - (A(^)) {Mt')) 



(19) 



from the simulations, where angular brackets denote the av- 
erage over the 100 realisations, and subsequently correlation 
coefficients 



rc(^,/) 



Covp(^,£') 



^Covp(^,.^) Covp(^',£') 



(20) 



These and the following equations all hold likewise for the 
transformed power spectra. 

In Fig.[S] we show for the original as well as for all 
transformed convergence fields. The power spectra trans- 
formed according to BCl and LOGl have been computed 
from the unsmoothed fields and those transformed accord- 
ing to BC2 and L0G2 from the smoothed convergence, i.e. 
the transformations have been applied to the cases where 
they should work optimally. The original convergence power 
spectrum features significant positive cross-correlations for 
I, t' > 200 which rise up to = 0.9 for £, £' > 1000. Box-Cox 
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Figure 6. Cumulative S/N as a function of the maximum 
angular frequency included. Results for the original (Box-Cox 
transformed; logarithmically transformed) convergence fields are 
shown as red (black; blue) curves. In the case of the transformed 
fields solid lines correspond to using parameters determined from 
the smoothed fields (BC2, LOG2), and dashed lines to param- 
eters obtained from the unsmoothed fields (BCl, LOGl). The 
effect on the S/N of smoothing with a Gaussian kernel of width 
5 pixels is indicated for the Box-Cox (BCl) transformed case by 
the black dotted line. For reference the Gaussian limit resulting 
from uncorrelated angular frequencies is shown as grey curve. 



convergence power spectra departs from this ideal already 
at ^ ~ 200, reaching a value of about 12.1 at £ — 1500, 
the maximum angular frequency we use for the likelihood 
analysis. 

In agreement with lSeo et al.l (|201ll ) the cumulative S/N 
for the transformed power spectra remains close to the Gaus- 
sian limit up to f ~ 1000, yielding an increase in S/N by a 
factor of 2.6 (2.0) for transformations based on the (un-) 
smoothed convergence fields. Again choosing optimal Box- 
Cox parameters or the respective logarithmic transformation 
makes little difference in the performance. The BC2/LOG2 
transformations concentrate the S/N into angular frequen- 
cies up to ^ = 1000 and level off in the regime where the 
smoothing washes out information (the increase in S/N for 
the LOG2 at very high £ is probably a noise artifact in the 
inverted covariance). 

The cumulative S/N for the BCl/LOGl transforma- 
tions has a shallower slope for £ ^ 1000, but surpasses 
the S/N for the BC2/LOG2 case beyond £ ~ 2500, even 
if the convergence fields are also smoothed with the same 
kernel. This behaviour is reflected also in rc where the 
BC2/LOG2 transformations suppress cross-correlations bet- 
ter in the range 200 < £ < 1000. The strong rise in S/N for 
the original convergence power spectra and the BCl/LOGl- 
transformed power spectra for £ > 3000 could be due to ei- 
ther noise or cosmological information from the highly non- 
linear regime, but is in any case inaccessible to us because 
of the limitations in modelling. 



and logarithmic transformations perform almost identically 
and reduce these correlations substantially, yet in neither 
case to a negligible level. 

These findi ngs are in disagreement with the results of 
ISeo et al.l l|201lh who obtained a level of cross-correlations 
that is consistent with zero after a logarithmic transforma- 
tion of the convergence. However, rc does not exceed 0.4 
even for their original power spectra, but direct compari- 
son is hindered by the different angular frequency binning 
which affects the Gaussian contribution to the diagonal of 
the covariance and thereby the normalisation of rc. The pa- 
rameters of the underlying simulations are similar to ours, 
except for considerably lo wer cosmological parameter values 
fim = 0.24 and erg = 0.76 (|Sato et al.ll2009l ). Thus non-linear 
clustering might be less pronounced in these simulations and 
hence their mode-coupling effects easier to remove. 

A first insight into the information content is given by 
the cumulative signal-to- noise (S/N), defined by 



S/N(^„ 



^P4£) Cov-^(^,^') ^^4^') 



(21) 



Note that we em ploy the correction factor given in 
IHartlata et al.l (|2007l ) to get an unbiased estimate of the in- 
verse covariance in the presence of simulation noise. This 
step removes the bias introduced when using the inverse of 
the sample cova riance as an estimator for the inverse; see 
lAndersonI l{200±_ for details. 

We follow ISeo etall (|201ll ) in using the maximum S/N, 
achieved in the limit of a Gaussian random field and given by 
the total number of independent modes, as a reference. As is 
demonstrated in Fig.|6l the cumulative S/N for the original 



4.4 Likelihood analysis 

Due to the computational costs of calculating the trans- 
formed power spectrum models according to equation (|12|l 
we restrict ourselves to the cosmological parameters f2m and 
as in the likelihood analysis. This should allow us to study 
the effects of Gaussianising transformations on the joint con- 
straints on cosmology as well as the potential to break the 
characteristic degeneracy in the f2m — as plane appearing in 
standard analyses of weak lensing two-point statistics. The 
signals from all 100 realisations are combined, so that we 
reach an effective survey size of 10, 000 deg^. 

We make the assumption of a Gaussian likelihood for 
both the original and transformed convergence power spec- 
tra, 



H{P'^},P) oc exp 



Nf 

-IE 



[P.i£^) - P.i£^,p)] (22) 



X CoYp\£,,£,) [P^{£,)~P4£„p)] 



where A^^ is the number of angular frequency bins. The mea- 
sured power spectrum Pk and the covariance are extracted 
from the simulations, while the cosmology-dependent mod- 
els Pk{£,p) are calculated from equation (|12p . 

In total Ni = 10 bins in the range 150 < ^ < 1500 
are included in the likelihood. We exclude lower angular fre- 
quencies to avoid systematic effects in the simulation power 
spectra due to discreteness error caused by the limited num- 
ber of Fourier modes per angular frequency bin at low £; see 
iKiessling erahl l|201ll ) for details. At / > 1500 shot noise be- 
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Figure 7. Histograms of the dis- 
tribution of convergence power 
spectra, transformed to zero mean 
and unit variance, for 100 < £ < 
500 (left panels), and 500 < ^ < 
3500 (right panels). The top row 
corresponds to power spectra com- 
puted from the original conver- 
gence fields, and the bottom row 
to those obtained from the Box- 
Cox transformed fields (BCl). The 
red curves are unit Gaussians with 
a normalisation adapted to the 
number of data in the histograms. 



[P,(0 - <P.(0>] / -r 



[P,(I) - (P,W)] / 



comes relevant, and the power spectra are largely suppressed 
by the smoothing. 

If the Gaussian approximation were of different accu- 
racy for the original and the transformed power spectra, a 
fair comparison of the resulting parameter constraints would 
be hampered. Therefore we inspect the distribution of power 
spectrum values from all realisations in the linear regime 
100 < ^ < 500 and the non-linear regime 500 < ^ < 3500 
where scales are not yet dominated by shot noise. The results 
for the original and the BCl-transformed power spectra are 
presented in Fig.[71 

At high I the distribution of the original convergence 
power spectra is marginally left-skewed, which is removed 
after Box-Cox transformation. In the linear regime should 
be Gaussian distributed anyway, so that the power spec- 
trum histogram should be well described by a distribu- 
tion, which is expected to be very close to Gaussian due to 
the large number of modes included. However, both origi- 
nal and transformed distributions are mildl y skewed, an ef- 
fect w hich was also evident in the results of 'Kiessling et al.' 
l|201ll ). We suspect that this is caused by the apodisation in 
the power spectrum estimation and will investigate this ef- 
fect elsewhere. Since the deviation from Gaussianity is small 
and similar for all power spectra considered, and since the 
bulk of the cosmological information stems from angular fre- 
quencies above 500 (see the error bars in Fig.|4]), we conclude 
that the assumption of a Gaussian likelihood is justified. 

We compute power spectrum models from equation (|12p 
on a grid in the ilm — crs plane with boundaries f2m G 
[0.15; 0.70] and erg £ [0.45; 1.20] which we treat as conserva- 
tive top-hat priors. The resulting 2<j confidence levels from 
the subsequent likelihood evaluation are shown in Fig. [8] The 
constraints from the original convergence power spectrum 



feature the typical banana-shaped degeneracy. The fiducial 
cosmology at Q.m = 0.27 and ag, = 0.81 is enclosed in the 
contours, nearly coinciding with the maximum likelihood 
point. 

The Box-Cox and logarithmic transformations 
BCl/LOGl produce similar constraints which are very 
narrow transverse to the degeneracy line, but the extent 
of the contours alongside the degeneracy is increased, in 
particular in the case of the BCl transformation for which 
a secondary, very elongated peak along the degeneracy 
line can be found at high fim and low erg. This indicates a 
nearly perfect degeneracy between fim and (Tg which is even 
more pronounced for the BCl transformation although it 
results in a higher cumulative S/N. The confidence region 
of the transformed power spectra is slightly tilted with 
respect to the original one, but the 2(t contours of both the 
LOGl and BCl results enclose the fiducial cosmology. The 
maximum likelihood is located at slightly higher values of 
(Tg than the fiducial one which is in agreement with the 
model for the fiducial cosmology being marginally low in 
overall amplitude; see the centre panel of Fig.[4]). 

Given the persistent degeneracy between fim and erg, 
marginal errors on theses parameters are of little value. 
Inste ad we employ q-yalues , defined as g = ^/A^^Q^ 
( Kil binger fc Schneideij 120041 ) , as a figure of merit, being a 
measure of the area enclosed by the confidence contours. 
They are based on the quadrupole of the posterior distribu- 
tion 

(3m>. = ^-L({Pk},p) (pp -Pmax,M)(P^ -Pmax,^) , (23) 
P 

where Pnj^x marks the point of maximum likelihood. The q- 
values which correspond to the parameter constraints shown 
in Fig.[8] are summarised in Table [3] Note that smaller 
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Figure 8. Left panel: 2cr confidence levels for the likelihood analysis based on the original convergence power spectra (red solid lines), 
the Box-Cox transformed power spectra (black solid lines), and the logarithmically transformed power spectra (blue dotted lines). No 
smoothing is used in the analysis of the original power spectra; the convergence transformations are based on parameters determined 
from the unsmoothcd convergence fields (BCl, LOGl). The fiducial cosmology of the simulation is marked by the black point. Note the 
secondary likelihood peak at high Qrn and low crs in the Box-Cox transformed case. Right panel: Same as above, but including smoothing 
in the case of the original power spectra and using parameters determined from the smoothed convergence fields (BC2, LOG2). For 
reference contours for the unsmoothed original likelihood analysis are shown in grey. Note the strong bias in the LOG2 case. 



Table 3. Figure of merit in terms of q- values in the Qm—crs plane. 
The likelihood analysis has been performed for the original con- 
vergence fields, the Box-Cox transformed field, and the logarith- 
mically transformed fields. The second column contains results 
based on transformations determined from the unsmoothed con- 
vergence (BCl, LOGl), the third column those based on trans- 
formations determined from the smoothed convergence (BC2, 
LOG2), and the fourth column those for the analysis including 
shape noise. Values of q are given in units of 10~^. Note that 
smaller q-values correspond to tighter parameter constraints. 
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(/-values correspond to tighter parameter constraints and 
hence a better performance of the transformations. Trans- 
forming the convergence according to the parameter sets 
BCl/LOGl yields an increase in this figure of merit, i.e. a 
degradation of constraints, by roughly an order of magni- 
tude. 

The S/N discussed in the foregoing section is equiva- 
lent to the Fisher matrix with the amphtude of the power 
spectrum as the single inferred parameter. Hence the S/N 
can also be used as a measure for the change in constraints 
when only erg is varied. Contrasting the doubUng in S/N 
with the pronounced increase in q, it is clearly the failure 
of breaking the flm — os degeneracy that hinders a stronger 
improvement in the figure of merit. 

We repeat the likelihood analysis for the original con- 
vergence power spectra smoothed with the same kernel as 
the transformed convergence fields; see Fig.[8l right panel. 
The smoothing affects the area of the confidence region only 



marginally, but the suppression of the signal from high an- 
gular frequencies shifts the contours upwards along the de- 
generacy line. The corresponding g-value increases slightly. 

The likelihood analysis for the transformation BC2, 
which boosts the cumulative S/N stronger than BCl/LOGl, 
results in g- values that are a factor of 4.4 smaller than for the 
original convergence (Table |3]). The confidence region still 
features a degeneracy between Qui and as, but has shrunk 
considerably. We observe a similar tilt of the degeneracy line 
as for the BCl/LOGl case and a mild bias, the 2a confidence 
level touching the point of the fiducial cosmology. 

In stark contrast to this, the LOG2 transformed models 
fail to fit the simulation power spectra, leading to a strong 
bias in cosmological parameters and a very strong degener- 
acy between Sim and as. The L0G2 transformation features 
by far the smallest value of the shift parameter a and there- 
fore the strongest boost of higher-order contribution. As we 
detail in Appendix |B] terms of the order P* and higher, 
which we are unable to model, are likely to become relevant 
in this case, particularly on small scales where the amplitude 
of the model is correspondingly low (Fig. lBll) . 



4.5 Effect of shape noise 

So far we have considered an idealistic experiment with noise 
levels that cannot be achieved even by future weak lensing 
experiments. For instance, the deep, space-based COSMOS 
survey contain s Tigai = 76 arcmin"^ gala xies usable for shape 
measurement jSchrabback et ahl I2OI0I I. which would still 
produce noise power more than an order of magnitude larger 
than the shot noise level. We assume Wgai — 30arcmin~^, 
which the planned Euclid mission aims for, resulting in a 
noise power spectrum Pnoisc = a1/{2n^a,i) ~ 1-7 x 10~^°. 
Transformations are only determined from the smoothed 
convergence fields as otherwise the (Gaussian) shape noise 
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Figure 9. Same as Fig.|3l but for the case with shape noise added 
to the convergence fields. Additionally, the resulting convergence 
distribution after a log-arctan transformation is shown as green 
dashed line. 

would dominate the one-point distribution of convergence 
values and hence obscure any cosmological effects. 

Since the overall minimum of the convergence is very 
similar to the case of the unsmoothed noise-free maps, we 
set again a — 0.07. The optimisation procedure for the Box- 
Cox parameters prefers strongly negative values of A <C —10 
mainly to reduce the residual kurtosis, which causes numer- 
ical issues, e.g. due to a very small variance of the trans- 
formed convergence. Hence we restrict A to a moderately 
negative value of approximately —7.5 and choose a such that 
(A, a) lies on the degeneracy line of close to optimal Gaus- 
sianity, analogous to the one observed in Fig.[2l see Table [1] 
for an overview on the transformation parameters. 

Figure [9] shows that both logarithmic and Box-Cox 
transformations (designated LOGs and BCs, respectively) 
struggle to render the one-point distribution of the trans- 
formed convergence Gaussian. The original distribution still 
features a long positive tail caused by clustering, but val- 
ues of K, below the mean now have a shallower slope closer 
to a Gaussian due to shape noise. The transformations are 
capable of reducing the skewness of this hybrid distribution 
to negligible values, but the Mexican-hat shaped residuals 
in Fig.[51 top panel, indicate that a significant positive ex- 
cess kurtosis remains; see Table [2] for the statistics. Con- 
sequently, Dkl for the transformed fields is larger than in 
the cases without shape noise while the opposite holds for 
the original convergence, so that one expects overall less im- 
provement through the Gaussianising transformations. 

To ensure that the limited ability of Box-Cox-type 
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Figure 10. Same as Fig.|4] bottom panel, but showing the Box- 
Cox transformed power spectrum in the presence of shape noise, 
again smoothed with a Gaussian kernel of width 5 pixels. Note 
that the shape noise is included both in the simulation power 
spectrum and the model. 

transformations to arrive at a Gaussian one-point distribu- 
tion (which could in principle be overcome by a rank-order 
Gaussianisation procedure) does not mislead our conclusions 
on the information content of Gaussianised fields, we intro- 
duce yet another type of transformation which fares better 
in the case of noisy convergence fields. Noting that the main 
flaw of the BCs/LOGs transformations is a significantly lep- 
tocurtic result, we define 

Ri(s, a) = arctan [s \n{Ki + a)] , (24) 

i.e. after the LOGs transformation we apply in addition the 
arc-tangent, using a free scaling s as a second free param- 
eter. We illustrate the mapping by equation (|24p in Fig.[T] 
Via a straightforward generalisation of the Box-Cox formal- 
ism one can derive an optimisation for the transformation 
parameters in analogy to equation ((6|, as well as models for 
the transformed power spectrum by means of the procedure 
presented in Section [3.21 and Appendix [B] 

In our models we incorporated terms up to third order 
in K, where the kJ^ contribution only entered the first four- 
point term in equation (|10p ; see also Appendix |^ Apart 
from an irrelevant overall rescaling with s, only this term 
is modified, as is readily seen by consulting the Taylor ex- 
pansion arctana; = x — l/3x'^ + O(x^). To leading order, 
the arc-tangent is the identity transform, and the next-to- 
leading order can contribute only to terms that are third 
order in k or higher. As Fig.[9] demonstrates, this log-arctan 
transformation indeed results in a Gaussianised one-point 
distribution for the convergence with an accuracy compati- 
ble to the noise- free case (see also Table [2]). 

Figure [10] shows the contributions to the model of the 
BCs-transformed power spectrum, again obtained by using 
equation p2|) . which provides a good fit to the mean from 
the simulation. Although A <C 0, the three-point, four-point, 
and higher-order contributions are small and sequentially 
decline in amplitude because a is almost an order of magni- 
tude larger than for the noise- free Box-Cox transformations. 
Note that both model and simulation include shape noise 
which is visible as the bump at £ > 500. 

Gaussian shape noise adds only to the diagonal of the 
power spectrum covariance and thus reduces the importance 
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Figure 11. Same as Fig.[6l but based on convergence fields with 
shape noise. Results for the original (Box-Cox transformed; log- 
arithmically transformed) convergence fields are again shown as 
red solid (black solid; blue dotted) curves. In addition we plot the 
S/N obtained with the log-arctan transformation as green dashed 
line. All fields have undergone smoothing with a Gaussian kernel 
of width 5 pixels. The Gaussian limit (attainable without shape 
noise) is shown as grey curve; the limit including shape noise in 
the covariance is shown as grey dotted curve 

. Note the different scaling of the abscissa compared to Fig.|6] 

of off-diagonal terms. This implies a decrease in rc for the 
original covariance power spectrum, as is evident in the bot- 
tom right panel of Fig. [5] The gain in decorrelation due to 
the transformation of the convergence is largely reduced as 
Tc changes httle compared to the noise-free case and even 
marginally increases for £ 500. The different transforma- 
tions perform similarly, where as a trend we find that the 
closer the transformed one-point distribution for k is to a 
Gaussian, the smaller the cross-correlations between angu- 
lar frequencies. 

The same conclusion holds for the cumulative S/N dis- 
played in Fig.ini yielding improvements of 34 % (20 %; 13 %) 
by the log-arctan (BCs; LOGs) transformation over the S/N 
of the original convergence power spectrum at £ = 1500. 
However, all curves deviate largely from the Gaussian limit 
(which can only be reached if noise contributions are negli- 
gible) from £ « 200 onwards. This remains true even if we 
consider the S/N using a Gaussian covariance with shape 
noise included, so that the comparatively low S/N is mainly 
caused by the cross-correlation of angular frequencies, and 
not by the high er noise levels. A gain these results differ from 
the findings bv lSeo et al.1 (I2OIII ) who assume the same num- 
ber density of galaxies, but whose cumulative S/N degrades 
less in the presence of shape noise. We can only speculate at 
this point that this discrepancy might, like in the noise- free 
case, be related to the different levels of non-linear structure 
evolution in the underlying N-body simulations. 

Note that shape noise of course adds to the covariance, 
but should not be included in the signal, i.e. not enter the 
power spectra used in equation (|2T||. The usual approach 
of subtracting the shape noise power spectrum from the ob- 
served signal does not work after non-linear transformations 
of the convergence which spread noise contributions to all 
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Figure 12. Same as Fig.fS] but for the case with shape noise 
included in the convergence fields. Again the fiducial cosmology 
is indicated by the black point. 

terms of even order in k, see equation p2|l . Since in this 
case our analytic models fit the simulation well, we recom- 
pute the model without shape noise and use this result in 
the S/N computation. 

Both g-values and contours change less compared to 
the noise-free transformations after Gaussianising the con- 
vergence, see Fig. [12] and Table O Despite the increase in 
S/N, and although the transformations have been optimised 
for the smoothing and noise level present in the convergence 
fields, parameter constraints mildly degrade. The 2a con- 
tours for the Box-Cox and logarithmic transformation have 
a similar form, being slightly more concentrated in the for- 
mer case (hence we expect analogous results, with possi- 
bly marginally tighter constraints still, for the log-arctan 
transformation). The degeneracy between the cosmological 
parameters is once again more pronounced than for the orig- 
inal likelihood analysis, the degeneracy line being tilted in 
the same way as in the noise-free cases. The confidence re- 
gions comfortably enclose the fiducial cosmology, so that in 
the most realistic situation of a convergence with shape noise 
our modelling is reliable and thus our conclusions robust. 



5 INTERPRETATION AND DISCUSSION 

5.1 Performance of Gaussianising transformations 

Generally we can confirm earlier results that a logarithmic 
transformation of the weak lensing convergence renders its 
one-point distribution close to Gaussian, mainly via remov- 
ing the skewness induced by structure evolution. Optimised 
Box-Cox transformations perform in all considered cases 
significantly better in Gaussianising the convergence distri- 
bution, but do not necessarily produce better constraints 
on cosmology than the logarithmic transformation. This 
suggests that any fine-tuning on the shape of the trans- 
formed one-point distribution has only a modest effect on 
the amount of cosmological information in the transformed 
two-point statistics, implying that little could be gained by 
using a perfect rank-order Gaussianisation as in IYu et al.l 
(|201ll ). 
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Figure 13. Comparison between thie 2tT confidence levels ob- 
tained from the power spectrum likelihood analysis (red solid line; 
cf. Fig. Is) and the convergence likelihood analysis (black dashed 
line) after Box-Cox transformation BCl. 



For all three situations we study, both logarithmic and 
Box-Cox transformations fail to reduce correlations between 
power spectra at different angular frequencies to a negligible 
level, so that a non-vanishing connected trispectrum must 
be present in the transformed convergence fields. Together 
with the measurement of a non-zero bi spectrum from a per- 
fectly (one-point) Gaussianised field bv lYu et al.l (|201ll ). this 
provides firm evidence that manipulating the one-point dis- 
tribution is insufficient in turning the c onvergence into a 
Gaussian random field. As discussed in IYu et all (|201l[ ). 
this also implies that the assumption of a Gaussian cop- 
ula (|Scherrer et all |2010| : ISato et al.l 1201 ll ) to describe the 
convergence field is of limited accuracy. 

In addition to concentrating cosmological information 
into two-point statistics, a Gaussianised convergence would 
allow one to use an exact functional form for the likelihood. 
Instead of assuming a Gaussian likelihood for weak lensing 
two-point statistics, which cannot be ac curate due to the ef - 
fects of non-linear structure formation ( Hartlap et al. bOOgjl 
and because of theoretical arguments jSchneider fc Hartlad 
l2009l ). one treats k itself as the data for which the Gaus- 
sian assumption then holds. In Appendix [C] we outline the 
likelihood formalism for n and show that the Fisher infor- 
mation in the likelihood for k is equivalent to that in the 
likelihood for if the latter is Gaussian and contains a 
Gaussian covariance. 

We compare the constraints from the two likelihood 
formalisms for a Box-Cox transformed (BCl) convergence 
without shape noise in Fig. 1131 The resulting confidence 
levels are largely different, the likelihood based on k, as 
the data- vector having considerably less constraining power. 
The difference can be ascribed to the residual connected 
four-point correlations in the convergence fields which can 
be incorporated into the power spectrum likelihood via the 
simulation covariance matrix with its off-diagonal terms, but 
not into the likelihood for k which includes at most terms 
that are second order in k. This result suggests that the 
residual non-Gaussianity of the convergence after transfor- 
mation is not a small effect, and that the information in 
the transformed connected trispectrum helps considerably 
constraining cosmological parameters. 



To ameliorate the performance, it is therefore neces- 
sary to go beyond transformations of the one-point distribu- 
tion. Box-Cox transformations are readily applied to multi- 
dimensional data ( Velilla, 199311 ■ so that one could in prin- 
ciple compose a large data-vector of all convergence values 
on the gridded k map and assign an individual pair of Box- 
Cox parameters (A, a) to each entry. As a consequence the 
transformation becomes scale-dependent, which violates the 
statistical translational invariance of the convergence fields 
and is thus undesirableQ- The same holds for a global trans- 
formation of the Fourier-transformed convergence values hh 
which couples spherical harmonics, i.e. angular frequencies 
t with 2^, M, £/?, etc. 

Hence, the only practical option seems retaining a 
global transformation of the real-space convergence, but us- 
ing a multi-dimensional k data-vector, thereby taking spatial 
correlations within the convergence map into account. Then 
the variance in equation ® needs to be replaced by the full 
covariance of the k values in the data-vector, readily ob- 
tained by measuring the correlation function £,k{0) = £,+{0) 
from the fields. As an aside, note that it is not obvious 
how to generalise rank-order Gaussianisation procedures to 
more than one-dimensional data as they rely on the concept 
of a cumulative probability density function. The necessary 
statistics to optimise the transformation parameters in this 
multivariate case could either be obtained from a large num- 
ber of simulation realisations or by exploiting translational 
and rotational invariance of a single simulation or observa- 
tional data. 

Although fairly comprehensive, the flexibility of Box- 
Cox transformations encapsulated in the parameters A and 
a might not suffice to Gaussianise the multivariate distri- 
bution of convergence values to the desired accuracy. The 
formalism used in this work to find optimal transformation 
parameters via equation ^ and develop models of the trans- 
formed power spectrum (see Section r3.2|) is applicable to any 
parametrised, analytical set of transformations, so it could 
e.g. be used to explore general parametrisations with a larger 
number of free parameters. 

It would be desirable to derive a more physically mo- 
tivated set of transformations which ideally describe a bi- 
jective mapping from the present-day convergence field to a 
nearly Gaussian convergence that one would have observed 
at an early stage of structure formation. Since matter in 
high-density regions is virialised and thus can by definition 
not remember its original trajectory, such a mapping can 
principally only exist down to a certain spatial scale. We 
defer the investigation of advanced transformations of the 
convergence as outlined above to future work. 



5.2 Extraction of cosmological information 

Throughout this work we use invertible transformations, so 
that the mapping should preserve the cosmological infor- 
mation contained in the convergence fields (we ignore the 
smoothing for the moment). This information is distributed 
over the n-point statistics of the field, and in the dependence 



* This is readily shown by introducing a dependence of A and a 
on in equation and then repeating the computation of the 



correlator (k(^) k* {£')). 
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of these statistics on the amplitude and phases of the angu- 
lar frequencies (or equivalently angular scales) involved. The 
power spectrum only depends on the absolute value of £ and 
not its phase while higher-order statistics also vary e.g. as 
a function of the internal angles of the triangle, quadran- 
gle, etc. they are evaluated at. If the convergence was trans- 
formed into a perfect Gaussian random field, all information 
in the amplitude and phase dependence of all n-point statis- 
tics would be transferred into the amplitude dependence of 
the transformed power spectrum. 

With this in mind we will attempt to elucidate why none 
of the transformations could efficiently break the degener- 
acy between fim and ag, thereby limiting the improvement 
in, or even degrading, the figure of merit. Our models in- 
clude both the power spectrum and bispectrum as the dom- 
inant contributions, and their combined anal ysis has been 
proven to break this pa rameter degeneracy (Ta kada fc JainI 
12004 iBerge' et al.|[201ol ). However, these works simplistically 
assumed that there is no cross-variance between two- and 
three-points statistics. Since working with Box-Cox trans- 
formed power spectra does not suffer from this simpliffca- 
tion, our findings could indicate that the five-point cross- 
variance between power spectrum and bispectrum partly 
eliminates the complementarity of these statistics. 

Alternatively, the transformations we considered might 
have failed to incorporate information which is capable of 
breaking the f2m — as from e.g. the bispectrum into the 
transformed power spectrum. Note that only integrals over 
the higher-order spectra contribute to Pn; see equation (|10p . 
For instance, the triangles of angular frequencies at which 
the bispectrum is evaluated have one fixed side length £, 
and all possible positions of the third point of the triangle 
are averaged over in the integration, thereby diluting the 
independent phase information in B^- 

We demonstrate the effect on the sensitivity to cos- 
mological parameters by comparing the derivatives of the 
quantities involved with respect to Qm and erg. A perfect 
degeneracy between the two parameters is expected if their 
derivatives have exactly the same dependence on angular 
frequency over the range considered for the likelihood anal- 
ysis. Thus we use the relative difference in the derivatives. 
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as a measure for degeneracy-breaking capabilities. Here, S 
stands for the quantity whose properties are tested, i.e. the 
convergence power spectrum or bispectrum, the transformed 
power spectrum, as well as its three-point contribution. To 
simplify the visual inspection, we normalise the ratio of 
derivatives to unity at £ = 1300, i.e. in the regime where the 
S/N is highest. Hence, a flat rdis indicates a strong degen- 
eracy between parameters whereas a strongly varying r^iB 
and in particular a steep slope at the pivot I signify non- 
degenerate constraints. 

As is shown in Fig. 1141 rdis for the convergence power 
spectrum varies only moderately with a rather shallow slope 
a,t £ — 1300, remaining relatively close to zero within the an- 
gular frequency range entering the likelihood analysis. For 
reference we also show rdiff with the derivative with respect 
to f2m replaced by the one with respect to the slope of the 
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Figure 14. Relative difference rdiff between derivatives with re- 
spect to Qui and erg as a function of angular frequency. Top 
panel: Normalised rdiff for the three-point contribution to the 
Box-Cox transformed power spectrum (black solid line), the bis- 
pectrum of the original convergence using equilateral triangles of 
side length £ (red solid line) , and the bispectrum of the original 
convergence using isosceles triangles with two side lengths fixed 
at £i = £2 ~ 1265 and third side length £ (blue solid line). Bottom 
panel: Normalised rdiff for the full Box-Cox transformed power 
spectrum with parameters BCl (red solid line) and BC2 (blue 
solid line), and the power spectrum of the original convergence 
(black solid line) . For comparison we have also plotted r jig for the 
convergence power spectrum with the derivative with respect to 
Qm replaced by the derivative with respect to Us as grey dashed 
line. In both panels the range used for the likelihood analysis is 
marked by vertical lines. All curves have been normalised to zero 
at £ ^ 1300, i.e. within the angular frequency range with highest 
S/N entering the likelihood analysis. 



initial matter power spectrum Ua. This parameter predomi- 
nantly affects the slope of the convergence power spectrum 
while as only changes its amplitude, so that these param- 
eters are close to orthogonal. Correspondingly, rdiff has a 
steep slope at the pivot point and attains values more than 
an order of magnitude larger than the original rdiff with Sim • 
The curve for the Box-Cox transformed (BCl) power 
spectrum has a similar form, so that no signiflcant improve- 
ment in the parameter degeneracy can be expected. In fact, 
the degeneracy proves to be much more pronounced in this 
case (see Fig. [Si, which might also be related to the size 
and correlation of the errors on the power spectrum. In the 
case of the BC2 transform for which we found strong con- 
straints rdiff is even slightly closer to zero over a large por- 
tion of the angular frequency range, i.e. the degeneracy is 
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still present. However, we find that the relative change of 
the BC2-transformed power spectrum with ilm and og, in- 
dividually is much stronger than for the original Pk, hence 
the substantial shrinkage of the confidence region. 

Furthermore the three-point contribution to the trans- 
formed power spectrum, i.e. the integrated bispectrum in 
equation (|10l) . produces again a slowly varying rdifi, albeit 
with a differing dependence on angular frequency. Contrast- 
ing this with Tdiff for the convergence bispectrum evaluated 
at two exemplary sets of triangle shapes, one finds a very 
similar functional form as for the three-point contribution 
to the Box-Cox transformed power spectrum for isosceles, 
as well as a more strongly varying curve with steeper slope 
for equilateral triangles (which has little effect i n practice as 
the S /N for equilateral triangles is small; see e.g. lBerge et al.l 
Indeed constraints from the bispectrum alone are also 
degenerate, but with a different degeneracy line from the 
two-point case, so that a joint likelihood analysis yields much 
tightened constraints. Contrary to this, in the likelihood 
analysis of Box-Cox or logarithmically transformed power 
spectra only a linear combination of these two- and three- 
point statistics enters, cancelling the degeneracy-breaking 
capabilities to a large degree. 

To summarise, the particular way in which the conver- 
gence statistics are combined to arrive at the Box-Cox trans- 
formed power spectrum implies a dilution and partial can- 
cellation of cosmological parameter dependencies, thereby 
yielding much less improvement in the breaking of the de- 
generacy between fim and as in the angular frequency range 
we consider than if the statistics were analysed separately 
and their constraints combined afterwards. See also the per- 
fect cancellation of terms at different order in the case of 
a lognormal distributed convergence (Appendix [B| . Since 
the transformed power spectrum is rather featureless (see 
Fig-Hli a-nd thus parameter dependencies generally difficult 
to disentangle, we expect similar conclusions to hold if a 
larger set of cosmological parameters is considered. 

All conclusions made in this paper are restricted to the 
limited range of angular frequencies available for analysis. 
One has little control over which angular frequencies cosmo- 
logical information is transferred to by the different trans- 
formation^, so it might well be possible that the gain from 
Gaussianising the convergence is much higher when extend- 
ing the analysis farther into the non-linear regime. 

In our case the restrictions in i were given by the res- 
olution of the simulation, but were more stringently deter- 
mined by the limitations of analytical modelling of weak 
lensing statistics on non-linear scales. While no significant 
progress on (semi-) analytical models is to be expected in 
the near future, a potential remedy could be provided by 
the path integral margin alisation technique developed by 
iKitching fc TavloJ (|20ld ). This way all model uncertainties 
could be accurately accounted for, allowing e.g. two-point 



^ Note however the results of Fig.|6l when optimising the trans- 
formations on the smoothed convergence fields, the main increase 
in S/N is on scales which are not affected by smoothing whereas 
no independent information is added in the angular frequency 
range where smoothing is important. Note further that in prac- 
tice not only cosmological information would be re-distributed, 
but also remaining systematic effects, complicating the analysis. 



statistics at higher angular frequencies or a tree-level pertur- 
bation trispectrum to add to the constraints without risking 
parameter bias. 

Modelling issues can be circumvented by resorting to 
a m assive suite of simulations to sample parameter space 
(e.g. lNevrinck|[20Tll '). The immense computational costs are 
not necessarily a downside of the Gaussianisation approach 
as also standard weak lensing measurements will eventually 
require a large simulation effort to obtain precise models 
for two- and higher-order statistics and their covariances. 
Even when fully simulating Gaussianised signals, however, 
one needs to carefully account for noise and resolution ef- 
fects, as our analytical models demonstrate. 

5.3 Prospects for an application to real data 

To be a viable alternative to the standard statistical anal- 
ysis of weak lensing data, Gaussianisation methods have to 
work in the presence of a realistic level of shape noise. We 
find a rather poor performance, even after introducing a 
transformation that accurately Gaussianises the one-point 
convergence distribution, with an only modest increase in 
S/N and a small degradation in constraints on cosmology. 
The main reason for this is that shape noise partly takes 
over the Gaussianisation of the data by turning the one- 
point distribution more Gaussian and decorrelating angular 
frequencies, so that there is less room for information gain. 
In addition, Gaussianising the one-point distribution does 
worse in bringing the convergence close to a Gaussian ran- 
dom field, as can be concluded from the increased correlation 
coefficient at intermediate scales 300 < t < 1000; compare 
the bottom panels of Fig. [5] 

Furthermore this and foregoing work are based on 
convergence fields, which however are not directly ob- 
servable. Convergence maps can be constructed from 
the gravitational shea r via grid-based techniques (see 
ISeitz fc Schneid er 1997 and referen ces therein) or pse udo- 
methods [Wandelt et al. 200 j; iBrown et al] l2005l : see 
iHikage et al.l I2OIII for an application to weak lensing) . In 
practice one needs to take into account the complex masks 
applied to weak lensing surveys, which will modify the dis- 
tribution of convergence value^J. Hence a more fiexible con- 
vergence transformation than a fixed logarithm, as provided 
by the Box-Cox formalism, could prove fruitful in this case. 

As already pointed out above, the results on more real- 
istic data might be improved by going beyond transforming 
only the one-point distribution. Even if that were success- 
ful, and e.g. the bi- and trispectrum in the transformed maps 
removed, one still could not rule out the transfer of informa- 
tion into n-point correlations of the transformed field with 
5. Thus one would not be spared the usage of large 
sets of simulations to verify that the noisy nth moment or 
n-point statistics are negligible, similar to the need for co- 
variances of higher-order statistics in the standard analysis 

^ Note that in the case of the CMB analysis the pseudo-harmonic 
coefficients are a linear combination of the original Gaussian dis- 
tributed coefficients, and thus also follow a Gaussian distribution. 
However, a linear combination of non- normal random variables 
generally results in another non-normal but differently distributed 
quantity, as applies to the weak lensing convergence. 
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of weak lensing data. Yet, one may be able to build up an 
alternative way to inference on cosmology from weak lensing 
data, based on Gaussianised convergence fields, which relies 
on different assumptions than the standard approach and 
therefore provides valuable complementarity. 



6 CONCLUSIONS 

In this work we investigated the information on cosmology 
contained in Gaussianised weak lensing convergence fields, 
using Box-Cox transformations of the one-point distribu- 
tion of the convergence k. We derived an expression for the 
power spectrum of the transformed convergence in terms of 
the statistics of the original fields and computed models in- 
cluding contributions up to sixth order in k and taking the 
dependence on noise and smoothing into account. 

From a set of 100 convergence maps obtained via N- 
body simulations we measured the correlation properties 
and the cumulative S/N of transformed power spectra for 
a number of different transformations. Using our analyti- 
cal models, we performed a likelihood analysis jointly on all 
simulated maps, deriving constraints on the parameters fim 
and as- 

Our main findings can be summarised as follows: 

(i) Optimal Box-Cox transformations prefer in all cases 
considered an even stronger downweighting of high-density 
regions in the convergence map than a logarithmic trans- 
formation and yield excellent results on the Gaussianisation 
of the one-point convergence distribution. The logarithmic 
transformation has results close to this optimum, perform- 
ing slightly worse in Gaussianising the convergence, but in 
some cases producing similar constraints on cosmological pa- 
rameters. The best results are obtained when extracting the 
transformation parameters from a convergence field that has 
already undergone the same smoothing as the fields used for 
the power spectrum estimation and likelihood analysis. 

However, none of the transformations were capable of ren- 
dering the transformed convergence fields close to a Gaus- 
sian random field, despite the one-point distribution being 
very close to Gaussian. We found significant residual corre- 
lations between power spectra at different angular frequen- 
cies, indicative of a non-vanishing connected trispectrum of 
the transformed convergence, and demonstrated that these 
have a strong effect on cosmological constraints if ignored. 
We discussed possible remedies by going beyond transforma- 
tions of the one-point distribution and advertised the Box- 
Cox formalism outlined in this work to be readily applicable 
to the multivariate case and alternative parametrised forms 
of transformations. 

(ii) The accuracy of analytical models for the transformed 
power spectra is limited by the uncertainty in the modelling 
of higher-order convergence statistics as well as by system- 
atic deviations of the fit formulae for the convergence power 
spectrum and particularly the bispectrum in the non-linear 
regime. Due to the non-linearity of the transformations this 
uncertainty affects all scales of the transformed statistics. 

Suppressing contributions from small scales substantially 
via smoothing, our models yields good fits to the simula- 
tions, enclosing the true combination of cosmological param- 
eters within the 2a confidence limits in five of six cases. The 
modelling fails for a logarithmic transformation with very 



small shift parameter a, which we demonstrate to be caused 
by important contributions from higher-order terms in the 
convergence beyond those that we can include. As stronger 
smoothing modifies the convergence distribution such that 
optimal Box- Cox parameters cause contributions by higher- 
order correlations to be even more important (more negative 
A; a closer to zero), the resulting bias in this one case can 
only be removed by further increasing the overall modelling 
accuracy. 

(iii) The cumulative S/N of the convergence power spec- 
trum in the range f 00 ^ I ^ 1500 increases by a factor of 
up to 2.6 after applying Box-Cox or logarithmic tr ansforma- 
tions, in qualitative agreement with the results bv lSeo et al.l 
(|201]J ). We find that the S/N is only a rough indicator of 
the strength of cosmological constraints, primarily because 
it does not account for degeneracies between parameters. 

Measuring the size of the confidence region in the flm — erg 
plane in terms of q-values, we obtain a significant degrada- 
tion due to a near-perfect degeneracy between ilm and as 
if the transformations are determined from the unsmoothed 
convergence fields, and a decrease in q by up to a factor of 
4.4 if transformations are optimised for the smoothing. Al- 
though contributions from e.g. the convergence bispectrum 
enter the transformed models, the degeneracy between Qm 
and as is broken in neither case, which we ascribe to the 
cancellation of information through the integration over the 
phase dependence of the bispectrum (and higher-order corre- 
lations) as well as the summation over the two-point, three- 
point, and higher-order terms. 

(iv) If a realistic level of galaxy shape noise is added to 
the convergence fields, transformations achieve an increase 
in the cumulative S/N by up to 34 %, but leave the statistical 
errors of and correlations between cosmological parameters 
practically unchanged, if not mildly degraded. The failure to 
boost the information contained in the transformed power 
spectrum is firstly caused by the fact that shape noise al- 
ready renders the distribution of convergence values more 
Gaussian, so that there is less to gain by a subsequent Gaus- 
sianisation, and secondly, the decorrelation of angular fre- 
quencies by the transformations performs worse. The lat- 
ter result means that the approximation that Gaussianising 
the one-point distribution renders the full convergence field 
Gaussian is worse in the more realistic case with noise. 

All of the conclusions above depend on the angular fre- 
quency range included in the analysis since cosmological 
information might be re-distributed to scales outside this 
regime by the transformations and hence not recovered in 
our study. The low maximum £ = 1500 in our likelihood 
analysis (plus a suppression of signal by smoothing rele- 
vant for £ > 1000) was governed by the limitations of an- 
alytical modelling. One way to extend the analysis deeper 
into the non-linear regime is t he inclusion of marginal isa- 
tion over free functional forms l|Kitching fc Tavlodl20ldl ) to 
account for uncertainty in the modelling, which of course 
would degrade cosmological parameter constraints. Other- 
wise one has to resort to simulations to explore parameter 
space for the likelihood of the transform ed power spectrum, 
as alr eady proposed by IYu et al.1 l|201ll) : see also iNevrincM 
{2Ql3). 

To prove that Gaussianising transformations of con- 
vergence fields are a viable and worthwhile approach to 
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the analysis of upcoming weak lensing surveys, it is fore- 
most necessary to demonstrate that cosmological informa- 
tion can be gained and the statistical properties of the trans- 
formed two-point statistics substantially improved under re- 
alistic conditions which include shape noise, a distribution 
of source galaxies in redshift, and the effects of masks on the 
convergence fields, ft will be the subject of follow-up work to 
investigate whether this goal can be achieved by improved 
modelling, an extended angular frequency range, or by going 
beyond transformations of the one-point distribution. 
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APPENDIX A: CALCULATION OF THE BOX-COX TRANSFORMED POWER SPECTRUM 

In this appendix we detail the calculation of the Box-Cox transformed convergence power spectrum from the statistics of the 
original convergence field. We take into account that the convergence has an additive noise component originating from the 
random intrinsic ellipticities of galaxies, indicated by a subscript n to k. Furthermore we consider the smoothed convergence 
ii'{G) = [^n * W] (0), where W is the Gaussian smoothing kernel. Via Taylor expansion we find 

Except for the irrelevant zeroth-order term the expansion for the case A = is identical. In analogy to equation (|8]) we change 
to Fourier space and apply the convolution theorem on the powers of k!{6). Writing n'{t.) — [kh x W] {£), one obtains for the 
two-point correlator of the transformed convergence {£, £' 7^ 0) 

{R{£) r (£')> = a^'-' {^'w ^'{-n) + ^ j ^ {{^'w ^'(^1 - n) + {^'{-n ^'i^i) «'(^ - ^o)} 

_^ (A-lKA-2) ^,,_,f ^ I ^|(^'(^) n'(-£,)n'{-£,)n'{£,+£,~£')) + {n'{-£')^'i£^)^'i£,)n' {£-£,-£,))} 
+ ^-7^ a— I ^ j ^ {^'{£,) ^'(£ - £,) ^\-£,) ^\£, - £')) + . (A2) 

Since the convergence is real, we could replace k*{£) — k,{~£). The first four-point term is produced by correlating the first- 
and third-order contributions in equation (|A1|) : the second four-point term by correlating the second-order term of the original 
convergence. Correlations of higher order than k'* will be considered in Appendix [B] 

The noise contribution to the convergence results in an additional scale-independent power spectrum, so that 
(Kn{£) k,il{£')) = (27r)^ 5^-^\£ + £') [Pk{£) +-Pnoiso]. Hlghcr-order statistics are not affected by noise, so that equation ((Ojl 
can be used. We apply Wick's theorem to split up the four-point correlators of k, arriving at 

(k(^) k(-^2) li{£l +£2- £')) = (k(^) k(-^2) +£2- £')), + (k(^) Ac(-^i)> (^(-^2) fc(^l + £2 - £')) 

+ {n{£) k{-£2)) +£2- £')) + (k(^) +£2- £')) (k(-^2) Ac(-^i)) 

= {2-Kf S^''\£ - £') T4£, -£i, -£2,£i+£2 - £) + [2^)" {5^''\£ - £i) 5'^^\£' - £i) P^f) P^{l2) 
+ S^^\£ - £2) 5'^^\£' ~ £2) P.{e.) P.{li) + 5^'^\£ -£'+£1+ £2) 5^^\£i + £2) P^{1) P.{ii)] , (A3) 

and likewise for k' and the other four-point correlators. In each of the power spectrum terms one delta function disappears 
after performing one of the angular frequency integrals, turning the other one into 5^'^\£-£'). Moreover, after renaming the 
remaining integration variable to £\, all products of power spectra in equation (|A3P can be written as Pk(^) Pk(£i). Treating 
the other correlators analogously, one hence obtains 

{R(£.)R'{£')) = {2^f 5^^\£-£')a^^~^ \^P^{l) + P^,i,,]W\l) (A4) 
+ (A - 1) a'^ W{e) j \£ ~ £i\) W(li) W{\£ - 

+ ~ ^^^^ " a-'js [P^{1) + Pnoisc] W\t) j [P4^l) + -Pnoisc] W\h) 

+ / §^ j ^T4£,-£^,-£2,£^+£2-£)W{ei)Wi£2)W{\£^+£2~£\) 
_^ (A^_l)^ ^_2|2 J [P4fi)-fP„oi,e] W\li) [P4\£ - £i\) + Pnoi.c] W\\£~£i\) 

^ J§^J ^T4£^,£-£^,-£2,£2-£)W{e,)W{\£~£^\)W{e2)Wi\£2-£\)'j+0{^')y 

Note that the Fourier transform of W depends only on the modulus of the angular frequency. Invoking equation ((9| again, an 
expression for the Box-Cox transformed power spectrum immediately follows, which reduces to equation ((10} if smoothing and 
noise are neglected. Equation (|12|l follows from equation (|A4|) if one defers the connected trispectrum terms to the higher-order 
contribution Hii), and if one re- writes the two-dimensional integration over £1 as a radial integral over £1 and an integral 
over the angle </? between £ and £1, noting that \£ — £i\'^ = + if — 2££i cos (p. 
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APPENDIX B: MODELLING HIGHER-ORDER CONTRIBUTIONS TO THE TRANSFORMED 
POWER SPECTRUM 

Including contributions only up to the Gaussian four-point level into the model of the transformed power spectrum results in 
significantly biased parameter constraints for both Box-Cox and logarithmic transformations. Hence higher-order terms are 
important and need to be included. Since Gaussianising transformations of the convergence using the logarithm perform well 
at least at the one-point level, we make the assumption that the original convergence is lognormal distributed, which allows 
one to proceed with analytic means. 

Under the lognormal assumption we derive in Appendix [Bl] a closed- form relation between the original and transformed 
two-point convergence statistics which, however, still yields poor fits to the simulations. Yet, by means of this relation we 
are able to calculate expressions for higher-order correlations of the lognormal distributed convergence in Appendix IB2I By 
comparing the moments of the convergence obtained from the simulations and from our models in Appendix lB31 we normalise 
the different contributions to match results from the simulated original convergence fields, arriving at the final expression 
(|B20|) for the higher-order contribution to the transformed power spectrum models. 



Bl The lognormal model 

A logarithmic transformation renders the one-point distribution o f the c onvergence, both with and without shape noise, 
Gaussian to good approximation; see above, and e.g. iTaruva et al] (|2002l 'l. Hence it is reasonable to assume that the orig- 
inal convergence follows a lognormal distribution although the non-vanishing higher-order statistics of the logarithmically 
transformed fields indicate that the lognormal assumption cannot be perfect. 

Analogous to equation Q for A = we write for the transformed convergence R — ln(K + a) + N, where we have now 
introduced a normalisation N to ensure that R has vanishing expectation. Solving this equation for k and choosing N such 
that (k) = 0, one obtains 



K + a = a exp 2" J ' 

where a denotes the variance of the transformed convergence. Note that throughout this appendix we do not include the 
effects of noise and smoothing on the convergence to keep the notation tractable. 

If K is lognormal distributed, R. follows a Gaussian distribution which allows us to compute expectation values analytically 
via Gaussian integration. In the case of two-point convergence statistics, this results in 

{[ii{x) + a] [k{x + e) + a]) ^ ^40) + = (^exp ^^R{x) - exp |K(a; + 6>) - = expi^^e)} , (B2) 

where we have made use of (k) = and the definition of the correlation function ^. Solving for the correlation function of k, 
one arrives at 

Ue) = In [1 + a-' UO)] = Ue) - ^a"* + ia-« + OiC) , (B3) 

where the second equality is derived from a Taylor expa nsion. The first equa lity provid es us with a closed-fo rm relation between 
transformed and original correlation functions (see also lHilbert et al]l201ll . as well as lColes fc Jonejll99ll for a similar result) 
which can readily be conver ted into a closed-form relation between transformed and original power spectra as and are 
Hankel transform pairs (e.g. ISchneider eral]|2002l ). 

However, although not suffering from the truncation at a certain order in k, the lognormal model for the transformed 
power spectrum fails to fit the simulations on small scales, as illustrated in Fig. lBll for the L0G2 transformation (we find 
similar results for LOGl). It is interesting to note that the lognormal model remains very close to the two-point contribution 
o~^Pre(£) which implies that the higher-order terms almost cancel each other; see the alternating signs of the expansion in 
equation (|B3|) . 

We ascribe the shortcomings of the lognormal model to the failure of providing a fair representation of the three-point, 
four-point and possibly higher-order statistics of the simulated convergence fields. Hence we seek to study the different orders 
of an expansion in k individually within the lognormal framework and match the resulting models to the simulations. A 
further motivation for this approach is that it will enable us to calculate expressions for arbitrary A. 

Transforming the expansion in equation (|B3|) to Fourier space yields 

P-4£) = a-' P^i) ~ ^a~' J ^P4l,)P4\£-£,\) + ^a-' J ^ J ^ P^(i,)P^(l,)P^(\l - £, - £,\) + 0(P'^) 

= a-^ P^{i)^]^a"'B{t) + ^a-''C{l) + 0{P^) , (B4) 

where we defined the shorthands B{t) and C{t) for convenience. Note that B{tj is the Fourier transform of and C{1) of 
5^. The first term in equation (|B4|) is identical to the one in equation pop for A = 0, as expected. The second term is of the 
same form as one of the Gaussian four-point contributions, but has a different prefactor. As we shall see below, this term 
receives additional contributions from the lognormal bispectrum. The third term collects contributions of order P^ which are 
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Figure Bl. Same as Fig.U bottom panel, but showing the logarith- 
mically transformed power spectrum (LOG2). In addition the model 
based on the lognormal prediction is shown as green dotted curve. 
Note that both modelling approaches fail to match the simulation 
data on small scales. 



not included in equation (|10|) . but which may be important particularly for small a. In the next section we will identify and 
explicitly calculate all contributions that are third order in the two-point statistic within the lognormal framework. 



B2 Lognormal higher-order correlations 

In complete analogy to the steps performed in equation (|B2p one can derive relations between the n-point correlations of k 
and the correlation function ^s- Subsequently applying equation (|B3|) . one can express the n-point correlations in terms of 
the two-point statistic of the original convergence field as follows, 

{n ^ n 

^ 5s(|a;j - a;i|) > = a" JJ^ [l + a"^ ^^dxj - ajil)] . (B5) 
i<j ) i<j 

In the case of three-point statistics the correlator on the left-hand side can be expanded into 

{[k{x) + a] [k{x + 6»i) + a] [k{x + 62) + a]) = r«(6>i, 6*2) + a {^^(fi) + ^462) + ^4^12)} + a , (B6) 

where the shorthand notation 6ij = \6j — 9i\ was introduced. We defined V^{9\, 62) as the three-point correlation function of 
the convergence, i.e. the Fourier transform of the convergence bispectrum. Due to the homogeneity of the convergence field, 
two angular vectors suffice to specify Fk; we have not invoked rotational invariance at this stage. 
Together with equation (|B5|) . one can derive the lognormal three-point correlation function 

F«,ln(6»i, 02) = a-^ {C46'i)e4^2) + ^4di)i4dr2) + i4d2)Uei2)} + a-^ ^461)^492)^4912) , (B7) 
which, after Fourier transformation, yields the lognormal bispectrum 

B...i^^{£i,£2,e3) ^ a-^ {P4£i)P4£2) + P4£i)P4£3) + P4e2)P4^3)} + (^'^ I ^% P«(^4)P4Ki + ^4|)P4l^2-^4|) . (bs) 

Note that we use a subscript LN to label expressions that were obtained under the assumption of lognormality of k. This 
result allows us to compute the third-order integral expression entering equation (|10p in the lognormal case, 

B„,ln(A^i, \(. ~li\)^ a-' {2al P4£) + B{£)] + a-''al B{£) , (B9) 
where we defined the variance of the convergence given by 



Terms second order in P^ contributing to equation (|B9|) are of exactly the same form as the Gaussian four-point contributions. 
Taking the prefactors into account, see equation (|10p . the terms proportional to ct^ Pk{£) cancel, while the terms proportional 
to B{£) reduce to the second-order contribution to equation (|B4|) . 

The procedure spelled out in equations (|B5|) and (|B6|) can readily be applied to four-point statistics. Note however 
that the four-point correlator splits into connected and unconnected parts, the latter reproducing the Gaussian four-point 
contribution, and the former yielding the connected four-point correlation function for which we find 

»7^,ln(6>i, 6»2, 6>3) = a-^ {^4^1)^4^2)54^3) + U {9 2) £.4^13) + £.4^ 1)^(02)^9 23) + 13 perm.} + 0{C) ■ (Bll) 

Note that equations (|B7|) and (|B11[) are in agreement with the results found bv lHilbert et al.l (|201ll '). 

The corresponding expression for the lognormal trispectrum is readily derived but lengthy, containing also 16 terms that 
are third order in P„. Considerable simplification is achieved by performing the two integrals entering equation (|10|l . yielding 

f d'£i /■ r^^^^(^^_£^^_£2,£i+^2-^) = a-''{9atP4e) + Q^ll3{£)+C{£)}+0{P^); (B12) 



(27r)2 J {2tv) 
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Table Bl. Normalisations of contributions to the transformed power spectrum of order 
i = 3, .. , 6 in fc, as determined from the ith moment measured from the 100 simulation 
realisations. The second (third) column displays results for the case without (with) shape 
noise in the convergence fields. The coefficients r4 and rs are different because they are 
determined from the lognormal prediction which depends on Oln: the free parameter in 
the lognormal distribution. 



order 


noise- free 


shape noise 


rz 


(1.30 ± 0.04) 


(1.29 ±0.04) 


r4 


(4.06 ± 0.42) 


(5.19 ±0.50) 


rs 


(1.65 ± 0.05) 


(1.42 ±0.04) 


re 


1 


1 




0.03 


0.07 



1^2 



(27r)2 J (27r) 



T^.ln{£i,£-£i,-£2,£2-£) = a-^{4cr*P4£) + 8o-^S(^)+4C(^)}+0(P*) 



Note that the lognormal trispectrum and all lognormal n-point correlations with n ^ 5 do not contain terms proportional to 
anymore. 

The connected part of the five-point convergence correlation contains at least powers of 4 in the two-point statistic, so 
that we only have to consider the unconnected parts, consisting of products of two- and three-point statistics. Inserting the 
terms proportional to in equation (|B7|) into the five-point analogues of equations (|B5|) and (|B6[) . one obtains 



{n{x) n{x + 6>i) ... Ac(a; + 6I4)>ln = [C4^23)C4^24) + C4^23)e«(^34) + C4^24)C4^34 )] + 9 perm.} + O(C^) .(B13) 



Writing the real-space analogue of the computation done in equation (|A2|) . one realises that only correlators of the form 
K-'(a; + 0)^ are required to calculate the transformed convergence power spectrum, i.e. in the five-point case it is 
sufficient to determine 

{K{x)^*ix + e))^^ = {f,''{x)n{x + 9))^^ = a~'{6alC{0) + 24aU40)}+O{C); (B14) 
{^(x) k\x + 0))^^ = {k\x) ^{x + e))^^ = {6dW + 15a2 + 6a* ^6) + 3a^} + . 

Likewise, at the six-point level only the following correlators are needed, 

{k{x) i^ix + e)) ^ {i^{x) K{x + e)) = 15(7* c^W + oCd) ; (bi5) 
{K\x)K\x + e)) = {^Hx)K\x + e)) = i2a2dW + 3<7^ + o(d) ; 

{^{x)K\x + e)) = 6e2W+9aU4^) + C(d) • 

Note that to third order in only the Gaussian terms, i.e. triple products of two-point correlators, contribute to equation 
(|B15|) . so that the results do not rely on the lognormal assumption (hence the omission of the subscript LN). 

Noting again the correspondence ^'^(9) -f-!- B{t) and ^^(0) -f-^ C{tj, equations (|B14|) and (|B15|) are readily transformed to 
Fourier space. Extending the calculation of equation (|A2|) . with all necessary terms in the expansion of R given by equation 
(lAll) . we obtain the five- and six-point contribution to 

Pf+«>(^) = [4a*P4^) + agB(£)] + (^-1)^(^-2) [2ot P^jl) + ^ aj B{£) + 2C{1)\ (B16) 

^ i^-m-2)^^^m~^) ^. ^ (A-im_2)(A-3)^. ^(^^ ^ (A^inA^2f ^3^4 + 2C(.)] |+0(P*) . 

Inserting equation (|B12|l and the P^ contribution to equation (IB9|l into the general expansion given by equation (|10p . plus 
adding the equation above, one can show for A = that the terms proportional to a* Pk{£) and cr^ B{£) cancel, whereas the 
terms containing C{£) add up to reproduce the third term in equation (|B4[l . as desired. This implies that indeed terms of the 
same power in from different orders in k nearly or fully cancel each other under the lognormal assumption, resulting in only 
small corrections to the lognormal model. However, if the amplitudes of the higher-order statistics of the simulations are not 
well represented by the lognormal model, these cancellations will not occur anymore and cause substantially different signals. 

B3 Moment normalisation 

We compare the connected third and fourth moments of the original convergence fields as determined from the simulations 
and the models we employ. If finding a discrepancy, we normalise the corresponding contributions to the models to match the 
amplitude of the simulations. This implicitly assumes that the angular dependence of the various contributions is modelled 
correctly. We choose the moment as the quantity used for this comparison because it is given by the integration over all 
angular dependencies of the corresponding polyspectrum, thereby testing the models on all relevant scales. 

At the three-point level we con tinue to use the bispectrum model based on perturbation theory and the fitting formula 
bv lScoccimarro &: CouchmanI (1200 ll ). We calculate the third-order moment from the bispectrum via 

(,3^. r f ^ B.[l,J.2j.^{l,,l2,^)] , (B17) 
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where £a is given by equation (|13|l . The normalisation is then defined as = (K*)sim./('^') model, the results summarised in 
Table iBll The uncertainty quoted in the table originates from the error on the mean simulation moment measured from the 
100 realisations. For both the convergence fields with and without shape noise we find that the moment as obtained from the 
simulations is about 30 % higher than predicted by the perturbation theory bispectrum model. The sign of this deviation is 
in agreement with the underestimation of the bispectrum in the ACDM case by the fit formula, but is larger than the quoted 
15 % average discrepancy. We have also estimated that the uncertainty due to the fit formula should increase with i, see 
Section |4]2j but nonetheless find that a simple rescaling with rs yields satisfactory fits of the models for to the simulations. 

At the five-point level we intend to include the unconnected parts, consisting of products of two- and three-point correlation 
functions, into the modelling. We refrain from using the perturbation theory bispectrum in this case as this would necessitate 
computationally expensive convolutions of power spectra and bispectra. Instead, we work under the lognormal assumption 
which allows us to use the simple expressions contained in equation (|B16|) . While the second moments of simulated and 
modelled convergence should agree well, and indeed do, we need to match the third moment of the simulation with the 
lognormal one, given by 

(^c^) =r.,LN(0,0) = -a* + 4'^', (B18) 

cf. equation (|B7p. The l ognor mal model underestimates the third moment even stronger than the one based on 
IScoccimarro fc CouchmanI (|200lh . producing rs ^ 1.5. The normalisation rs is different for the noise-free convergence fields 
and those with shape noise because in the latter case the variance is larger due to the noise, changing the result of equation 
(|B18|) . Moreover the parameter a which enters the lognormal models, chosen to render R as close to Gaussian as possible, is 
not the same, see Table [T] 

Repeating the steps that lead to the expressions for the lognormal polyspectra, now setting = . . . = = 0, it is 
straightforward, though tedious, to calculate higher moments, 

/ 4\ Q 4 . , ^ -2 6 . , r -4 8 I e -B 10 I -8 12 min\ 

/j^j^ ~ Sai^ + lba ai^ + loa cr^ + oa + a cr^ ; (B19j 

('^^)ln " 30 (j^ + 135 a"^ cr^ 4- 222 a"'^ erf -f 205 a"^ (jf + ; 
(k-^)^^ = 15 (J^-I- 330 a"^cr^ + 1581 a"^af + 3760 a"'^(jf-fCl(K") . 

The first term contributing to (k'')ln stems from the unconnected part and corresponds to the Gaussian four-point term. The 
remaining terms originate from the trispectrum, where the second one can be identified with equation (|Blip . Comparing the 
connected fourth moments, we obtain r4 ~ 4 for the noise-free convergence and r4 « 5 in the case with shape noise, as also 
shown in Table iBl] This means that the rescaling of the trispectrum contributions to the model is quite substantial and is in 
addition associated with a 10 % uncertainty; see Fig.|3]for an illustration of the effect on the transformed power spectrum. 

We set — 1 since only Gaussian terms contribute to our models at the six-point level which should be modelled 
accurately. Connected fifth and sixth moments can only be determined with large error bars from the simulation, but as a 
tendency we find that they are considerably above the lognormal prediction given by equation (|B19|I . the ratios surpassing 
r ~ 10. While the simulations thus seem to favour even stronger higher-order contributions, we can still use the lognormal 
expressions in equation HB19P for a conservative estimate on how much an error is introduced when truncating the series of 
contributions to the transformed power spectrum after terms containing Pj^. 

For the smallest value of a we consider, a = 0.03 for L0G2, the term proportional to contributes 12 % to the leading 
term of the connected fourth moment; for a = 0.07 (LOGl) this reduces to a 2 % contribution. Similarly, at the five-point level 
we find the ratio of next-to-leading over leading term to be 57% (a = 0.03, L0G2) and 10% {a — 0.07, LOGl), respectively. 
For the sixth moment the higher-order contributions can even dominate, yielding ratios over the first, Gaussian term of 2.76 
(cr^ term), 1.66 (erf term), and 0.5 (af term) for a = 0.03. We understand these findings as the most likely explanation for 
the clear failure of our models in the LOG2 case (see Fie:. IBl|) . Due to the small value of a higher-order terms are boosted, 
with particularly strong contributions from positive six-point correlations, which could be the cause of the simulation signal 
being high at large I compared to the model. All other transformations considered in this work have a > 0.07 for which the 
terms that are not included into the models may not be completely negligible, but are clearly subdominant. 

If we want the higher-order modelling to be of practical use for all Box-Cox transformations, we have to differentiate 
between powers of a originating from the expansion of ft, e.g. those appearing in equation H10|) . from those entering via the 
lognormal models as in equations (|B9|) and (|B12[) . Note that we have not yet made this distinction in equation (|B16|) . so 
that all contributions of order had a prefactor a^'^~^'. We add a subscript LN to a from the lognormal models, keeping 
these parameters fixed at oln = 0.03 (noise-free) and aln = 0.07 (shape noise). Collecting all higher-order contributions, and 
incorporating the normalisations to the simulation moments, we finally obtain the model term 

7i{l) = CAat P4£) + Cb <tIB(£) + Cc C(£) with (B20) 
Ca = Ti a^^'^flLN (A - 1)(4A - 7) + rs a^^-^a^^ (A - 1)(A - 2)(3A - 7) + rg a^^"^ (A - 1)(A - 2)(A^ - 5A + 7)/2 ; 
Cfl = r4 a^^'^flL^ 2(A - 1)(2A - 3) + rg a^^'^^flLN (A - 1)(A - 2)(3A - 4) + rg a^^"" (A - 1)^(A - 2)(A - 3)/2 ; 
Cc = r-i a'^-^a^^ (A - 1)(4A - 5)/3 + rs a'^"'aLN (A - 1)'(A - 2) + rg a'^"" (A - 1)'(A - 2)V6 . 



To summarise, "}{{£) contains contributions from the leading term of the lognormal trispectrum, the unconnected five-point 
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correlations also in the lognormal framework, and the Gaussian six-point term. Note that the eflFects of noise and smoothing 
still need to be incorporated into equation (|B20|I . 



APPENDIX C: LIKELIHOOD ESTIMATION BASED ON THE CONVERGENCE AS THE 
DATA-VECTOR 



If one succeeds in transforming convergence maps such that they are close to a Gaussian random field, it becomes advantageous 
to compute a likelihood for k instead of the widespread likelihood analysis for two-point statistics. The latter are generally 
chosen because two-point statistics provide a certain amount of data compression and are expected to be closer to Gaussian 
distributed than the underlying field. However, weak lensing two-po int statistics are not accurately modelled by a Gaussian 
distribution (see e.g. iHartlap et al.l [20091: ISchneider fc Hartlap||2009l) and require the computation of four-point statistics to 
obtain covariances fe.g. lTakada fc Jainll2009l : IPielorz et al.ll2010l ~ If one treats the convergence itself as the data-vector, the 
covariance is a two-point statistic which contains the cosmological information (e.g. Heavens 2003). 

In the following we will outline our formalism for the likelihood for and compare its Fisher information to the one of 
the power spectrum likelihood. We continue to work in Fourier space and compose the data-vector of the values Kg of the 
convergence on the grid of a discrete Fourier transformation, defined via 



<0) = 



(27r) 



(CI) 



where S£ denotes the grid spacing of the discrete Fourier transformation such that Aadd = (27r)^/(5^^ for a square survey 
region. The H£ are Gaussian distributed if the convergence is a Gaussian random field, unless real-world effects such as 
masking become important. The covariance of the K£, averaged over annuli of radius £ and £' , with width A£ each, is given by 



cov„(£,o = E E E E i'^i^^M^^)) 



iS shcll{<) £jg sholl{£') 



£iGshcll(f)£jGsholl(f') 
£A£ 



(C2) 



fiS shcll{<) £j g sholl{£') 



/ S£\ £A£ 
(-) S.,W) = S,,,—P4£) 



where we used the equality 



E ^ 

^iS shell(^) 



shcll(£) 



2tv£A£ A{i^M£A£ 



coll 



5£2 



2tt 



Here, A 



shcll(^) 



denotes the area covered by an annulus and A^ 



(C3) 



the area of a pixel of the Fourier transformed convergence 



map. If the shells over which the average is performed do not overlap, the K£ are uncorrelated, i.e. the covariance in equation 

(|C2|) is diagonal. 

The likelihood for the Gaussian covariance is then given by (e.g. iBond et al.ll200d ) 



L({«;},p) = [(27r)'^^ det(Cov«)] exp. 



N, 

1 



2 ^ ^ CawJ£i) 

i=l £ g shcll(<i) 



(C4) 



A£, P4£^,p) 



-1/2 



exp ■ 



AT, 

1 P4^0 



2 ^ P4e^,p) 



where Ne denotes the number of angular frequency bins. To obtain the second equality, we inserted equation (|C2|l and the 
estimator of equation ([Sjl. Hence one can continue to measure two-point statistics, in this case power spectra, where the data 
determines and the dependence on the set of cosmological parameters p enters via the model Pk{£,p)- Note that shape 
noise is not yet included in equation (|C4|I . 

To assess the information content in L{{k,} ,p), we compute the Fisher m atrix. If the converge nce is Gaussian distributed 
and the dependence on cosmology in its covariance, the Fisher matrix reads (jTeemark et al ] ll997l ) 



F^^ = - Tr Cov„ — Cov„ — 

2 V opij, dp„ 



1=1 Ij g sholl{fi) 



(C5) 



E 



dP4£i) 



where in the last ste p we again made use of equation (|C3|I . The term in curly brackets is identical to the Gaussian power 
spectrum covariance (| Joachimi et al.ll2008l ). and thus the information content of L{{k\ ,p) and L({Pk} ,p) is the same for a 
Gaussian distributed convergence. Illustratively, a Gaussian random field is fully specified by its power spectrum, and therefore 
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the power spectrum covariance (a four-point statistic) cannot yield additional information. Conversely, if the convergence is not 
perfectly Gaussian, e.g. manifested via a non-vanishing connected trispectrum, L{{Pk} ,p) can still yield accurate constraints 
when incorporating the now more complex power spectrum covariance whereas L{{k} ,p) fails to account for such changes. 



